SiteMCP provides several utility parameters to optimize the crawling process:
- Concurrent control::
--concurrencyparameters (e.g.--concurrency 10) can increase the number of pages crawled at the same time - path matching::
-m/--matchparameter supports wildcard matching for specific URL paths (e.g.-m "/blog/**"(Grabbing only the blog part) - Content Selector::
--content-selectorGrab specific areas with precision via CSS selectors (e.g.--content-selector ".content") - Cache Management::
--cache-dirCustomizing the cache path.--no-cacheDisable Cache
These parameters can be used in combination, for example:npx sitemcp https://example.com --concurrency 5 -m "/docs/**" --content-selector "#main"
This order will:
- Crawl the document section with 5 concurrent
- Extracts only the content within the #main element
- Use default cache settings
This answer comes from the articleSiteMCP: Crawling website content and turning it into MCP servicesThe































