Supametas.AI's web data extraction consists of 5 key steps, all of which can be performed through a visual interface:
- New dataset: After logging in, click "New Dataset" and select the "URL" data source type.
- Configuration parameters::
- Enter the target web address (e.g. blog link)
- Set crawl depth (Depth Value=3 to crawl three levels of associated pages)
- Define update frequency (Loop Time Value=24 for automatic daily updates)
- priming processClick "Start Processing", the system automatically recognizes the page structure and extracts the title, text, charts and other elements.
- Optimization of results::
- Fine-grained extraction using natural language instructions (e.g., "capture product price and inventory")
- Manual adjustment of error fields via preview screen
- Export results: Choose to download in JSON or Markdown format after processing, or push directly to a knowledge base such as OpenAI Storage.
In practice, it is recommended to enable the "Schedule Update" function to achieve automatic data synchronization. For e-commerce price monitoring and other scenarios, you can define specific fields (e.g. discount deadline) with the "customKeys" parameter, and the system will keep the consistency of the field structure for subsequent analysis.
This answer comes from the articleSupametas.AI: Extracting Unstructured Data into LLM Highly Available DataThe