How to improve the reading order parsing problem of multilingual documents?

2025-08-19

514

dots.ocr has a specialized solution to the problem of confusing the reading order of documents in mixed languages or non-Latin languages:

Intelligent Sorting Algorithm: The model has a built-in reading order optimization function that automatically arranges blocks of text according to human reading habits.
Harmonized Output Format: Generate standardized JSON structured data containing element positional relationships and hierarchical information
language adaptation: Automatically adjusts the parsing logic for different language writing orientations (e.g., right-to-left for Arabic).
Visual Debugging: outputs numbered bounding box images for visual verification of correct reading order

It is recommended to use the prompt_layout_all_en prompt to get the complete layout analysis results.

Quick query station AI tool