The system generates three kinds of standardized output every time it parses: an ISO-compliant JSON file that completely records the coordinates, types and contents of all elements; a Markdown document optimized for reading order that retains the original layout logic; and a visual annotation diagram that distinguishes the categories of elements with different colors. The JSON output adopts the block compression storage technology, which reduces the index volume of a million-page document by 70%. Users can choose to enable the nohf mode to automatically filter the header and footer and other auxiliary information, or through the bbox parameter to achieve accurate extraction of the specified area, to meet the diversified needs of the digital management of the document.
This answer comes from the articledots.ocr: a unified visual-linguistic model for multilingual document layout parsingThe