The following steps and conditions are required to run GPT-Crawler locally:
environmental preparation
- mounting Node.js 16+ and npm (available via
node -vcap (a poem)npm -v(Verification) - Git tools (for cloning repositories)
Specific steps
- cloning project::
git clone https://github.com/BuilderIO/gpt-crawler.git - Installation of dependencies: Go to the project directory and execute
npm install - Configuration parameters: Modification
config.tsfile for key configurations:url: Starting crawl addressselector: CSS selector for a given content areamaxPagesToCrawl: Control the size of the crawl
- Start the crawler: Run
npm startAfter that, the result will be saved in the root directory of theoutput.jsoncenter
Note: Ensure that your network is free for the first run to complete the dependency download, dynamic web crawling may take extra time to load resources.
This answer comes from the articleGPT-Crawler: Automatically Crawling Website Content to Generate Knowledge Base DocumentsThe































