Docstrange provides an intelligent field extraction feature that can extract specific fields in two ways:
- Python API approach: Use
extract_data
method and specify thespecified_fields
parameters, such as the extraction invoice number and total amount:fields = result.extract_data(specified_fields=["invoice_number", "total_amount"])
- command-line method: Use
--extract-fields
Parameters such as:docstrange invoice.pdf --output json --extract-fields invoice_number total_amount
This feature is particularly suitable for quickly extracting key information from documents such as invoices and contracts, and outputting them into a structured data format.
This answer comes from the articleDocstrange: a tool for extracting data from documents and images and converting them to multiple formatsThe