Fault location method
When adding new RSS feeds with crawl exceptions, it is recommended to troubleshoot according to the following process:
- Basic validation: Use an online RSS validator (e.g. W3C Feed Validation Service) to check the feed formatting
- log analysis: View cron-job logs for Github Action (.github/workflows directory)
Systematic solutions
- Agent Configuration: For walled offshore sources, add a proxy configuration entry to cron_job.yml
- fault tolerance mechanism: Modify src/scraper.js to add retry logic (suggest 3 retries + exponential backoff)
- Parse Optimization: For special formats:
- Dynamic web page rendering using Puppeteer (requires adjusting Docker configuration)
- JSON format source changed to request with axios library
Preventive maintenance
It is recommended that a health screening system for RSS sources be established:
- Create feed_status collection in Firestore to record crawl success rate
- Setting up Discord Webhook alerts (refer to the project alert-system branch)
- Enabling Readability API secondary parsing for unstable sources
This answer comes from the articleAudibit: turning popular tech articles into ready-to-listen audio podcastsThe