Overseas access: www.kdjingpai.com
Ctrl + D Favorites

OCRFlux is an open source lightweight tool focused on converting PDF files and images to clear Markdown format. It is developed by the ChatDOC team , based on the 3B parameters of the multimodal large model construction , can run on ordinary hardware such as GTX 3090. The tool specializes in handling complex document layouts, accurately parsing multi-column formats, complex tables, and supporting automatic merging of content across pages. Compared to other open source OCR models, OCRFlux excels in accuracy, especially in table and paragraph processing. It provides easy-to-use command line operation , suitable for developers , researchers and users who need to convert documents to Markdown format . The project is open source on GitHub under the Apache 2.0 license, with an active community and 1.7k stars.

 

Function List

  • Convert PDFs and images to Markdown format, preserving the natural reading order.
  • Support for complex layout processing, including multi-column documents, illustrations and embedded content.
  • Automatically parses complex tables and supports rowspan and colspan HTML table output.
  • Cross-page content merging, which automatically detects and integrates tables and paragraphs across pages.
  • Provides high accuracy text recognition with Edit Distance Similarity (EDS) up to 0.967.
  • Based on a 3B parametric multimodal model, compatible with common GPU operation.
  • Open source and free, code and documentation are publicly available on GitHub, and community contributions are supported.

Using Help

Installation process

OCRFlux is a Docker-based tool that requires a Docker environment to install and run. The following are the detailed installation steps:

  1. Installing Docker
    Make sure Docker is installed on your system, if not, visit the Docker website to download and install the appropriate version for your operating system. Once the installation is complete, run the following command to verify it:

    docker --version
    

  1. Pulling OCRFlux Mirrors
    Run the following command in a terminal to pull the latest OCRFlux image from Docker Hub:

    docker pull chatdoc/ocrflux:latest
    
  2. Prepare the file path
    Create a local working directory (e.g. /path/to/localworkspace) is used to store input and output files. Make sure you have the following directories:

    • Enter the PDF file directory (e.g. /path/to/test_pdf_dir).
    • OCRFlux model file directory (e.g. /path/to/OCRFlux-3B). The model files should be downloaded from the official GitHub repository or from a link provided by ChatDOC.
  3. Running OCRFlux
    Use the following command to start the OCRFlux container, mount the local directory and specify the input PDF and model paths:

    docker run -it --gpus all \
    -v /path/to/localworkspace:/localworkspace \
    -v /path/to/test_pdf_dir:/test_pdf_dir \
    -v /path/to/OCRFlux-3B:/OCRFlux-3B \
    chatdoc/ocrflux:latest /localworkspace --data /test_pdf_dir/* --model /OCRFlux-3B/
    
    • --gpus all: Enable GPU acceleration (remove this parameter if there is no GPU).
    • -v: Mounts a local directory into the container.
    • --data: Specify the path to the input PDF file.
    • --model: Specifies the model file path.
  4. Generate Markdown files
    When the run completes, the Markdown output file is saved in the ./localworkspace/markdowns/DOCUMENT_NAME directory. Use the following command to convert the JSONL format to Markdown:

    python -m ocrflux.jsonl_to_markdown ./localworkspace
    

Usage Process

The core function of OCRFlux is to convert PDF or images to Markdown, here are the steps:

  1. Preparing the input file
    Place the PDF file or image to be converted into /path/to/test_pdf_dir Catalog. Support for common PDF formats and image formats (e.g. PNG, JPG).
  2. Run the conversion task
    Use the Docker commands above to start the conversion. ocRFlux automatically analyzes the document layout and recognizes text, tables and cross-page content. The conversion process may take a few minutes, depending on file size and hardware performance.
  3. Checking the output
    After the conversion is complete, open the ./localworkspace/markdowns/DOCUMENT_NAME Catalog to view the generated Markdown file. The file retains the natural reading order of the document, and tables are rendered in Markdown or HTML format.
  4. Handling complex forms
    OCRFlux can handle complex tables containing rowspan and colspan. The resulting Markdown file structures the table into a clear format suitable for direct editing or importing into other tools.
  5. Cross-page content merging
    For tables or paragraphs that span pages, OCRFlux automatically detects and merges the content. For example, tables spanning two pages are consolidated into one complete table, and paragraphs are spliced together in a logical order.

Featured Function Operation

  • Complex Layout Processing: OCRFlux supports parsing of multi-column documents and embedded illustrations. No additional configuration is required at runtime, the tool automatically recognizes the document structure.
  • High-precision recognition: In the OCRFlux-bench-single test, the tool achieves an EDS score of 0.967, outperforming olmOCR-7B (0.872), Nanonets-OCR-s (0.858) and MonkeyOCR (0.780).
  • cross-page merge: This is a unique feature of OCRFlux. The tool analyzes consecutive pages, detects tables or paragraphs that need to be merged, and outputs the complete content.

caveat

  • Ensure that the input PDF files are legible and that the recommended resolution of the scans is higher than 300 DPI.
  • If the GPU is unavailable, the conversion may be slow and a high performance CPU is recommended.
  • Check for model file completeness, missing files may cause the conversion to fail.
  • Visit the GitHub repository regularly for the latest version and update instructions.

application scenario

  1. academic research
    Researchers can convert academic paper PDFs into Markdown for easy editing and sharing.OCRFlux handles multi-column layouts and complex tables, ensuring clear formatting of formulas and references.
  2. Technical Documentation
    Developers can convert technical manuals or API documentation from PDF to Markdown for importing into a knowledge base or blog. Merge across pages to avoid content fragmentation.
  3. Invoice and form processing
    Finance staff can convert invoice or form PDFs to Markdown, extracting key information such as purchaser, unit price and price/tax totals for easy data analysis.
  4. content creator
    Creators can convert scanned books or notes into Markdown Jellybean format, organizing them into publishable Markdown files suitable for direct use on websites or documents.

QA

  1. What file formats does OCRFlux support?
    It supports PDF and common image formats (e.g. PNG, JPG). Input files need to be clear documents or scans.
  2. Need high-performance hardware?
    No. OCRFlux is based on a 3B parametric model and can be run on a regular GPU (e.g. GTX 3090) or a high performance CPU.
  3. How do I handle cross-page forms?
    OCRFlux automatically detects tables and paragraphs across pages and merges them to output the full Markdown format without manual intervention.
  4. What if the conversion results are inaccurate?
    Check the resolution of the input file (300 DPI or higher is recommended). If the problem persists, file an issue on GitHub for community help.
  5. Does it need to be networked to operate?
    No internet connection is required.OCRFlux runs in a local Docker environment, and models and data are processed offline.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish