Overseas access: www.kdjingpai.com
Bookmark Us

Dolphin is developed by ByteDance an open source document image parsing tool , focusing on processing complex document images , such as text , tables , formulas and images containing scanned or PDF files . It uses a "analyze first, parse later" approach to achieve efficient parsing through a two-stage process: first analyze the page layout of the document to generate a sequence of elements in a natural reading order; and then parse the document elements in parallel using heterogeneous anchors and task-specific hints.Dolphin in a variety of page- and element-level parsing tasks. Dolphin excels in a wide range of page- and element-level parsing tasks, balancing lightweight architecture with efficient performance. The tool has been presented at ACL 2025, and pre-trained models and inference code are available for developers to use. The code and models are available through a GitHub repository, as well as an online demo.

Dolphin-1

 

Function List

  • Page Layout Analysis: Automatically identifies text, tables, formulas and images in a document and arranges elements in a natural reading order.
  • Parallel Element Parsing: Efficiently process different types of document elements using heterogeneous anchors and task-specific hints.
  • Multi-modal input support: Handles complex document images containing text, images, tables and formulas.
  • Provide pre-training models: Users can download pre-training models and use them directly for reasoning or secondary development.
  • Open source support: Provide detailed code and documentation to support developers to customize and extend functionality.
  • Online Demo Platform: Users can test the parsing effect online through Demo-Dolphin.

 

Using Help

Installation process

To use Dolphin, users need to first download the code and pre-trained model from a GitHub repository or Hugging Face. Below are the detailed installation and usage steps:

  1. environmental preparation
    Dolphin relies on a Python environment, Python 3.8 or higher is recommended. Users need to install the following dependency packages:

    pip install torch torchvision
    pip install git-lfs
    

    Make sure you have Git and Git LFS installed on your system for downloading large model files.

  2. Download code and models
    Dolphin's code and models can be accessed in the following ways:

    • Downloading code from GitHub::
      git clone https://github.com/bytedance/Dolphin
      cd Dolphin
      
    • Download models from Hugging Face::
      git lfs install
      git clone https://huggingface.co/ByteDance/Dolphin ./hf_model
      

      Or use the Hugging Face CLI:

      huggingface-cli download ByteDance/Dolphin --local-dir ./hf_model
      
    • Model files can also be downloaded from Baidu Yun or Google Drive and placed in the ./checkpoints folder.
  3. Configuration environment
    After downloading the code, go to the project directory and check the ./config/Dolphin.yaml Configuration file to ensure model paths and parameters are correct. The configuration file contains the model architecture and inference settings, which can be adjusted by the user as needed.
  4. Runtime environment validation
    After the installation is complete, run the following command to verify the environment:

    python -m demo_element.py --help
    

    If the command outputs help information normally, the environment is configured successfully.

Usage

Dolphin provides a command line interface to facilitate the processing of single document images. Here is how to use the main features:

  1. Processing single form images
    To parse an image containing a table, run the following command:

    python demo_element.py --config ./config/Dolphin.yaml --input_path ./demo/element_imgs/table_1.jpeg --element_type table
    

    This command analyzes the table image, extracts the table content and generates structured output. The output is usually in JSON format and contains the rows, columns and cells of the table.

  2. Processing of formula images
    For math equation images, run:

    python demo_element.py --config ./config/Dolphin.yaml --input_path ./demo/element_imgs/line_formula.jpeg --element_type formula
    

    Dolphin recognizes the content of formulas and converts them to LaTeX format for further editing or rendering.

  3. Processing text paragraph images
    To parse a text paragraph, run:

    python demo_element.py --config ./config/Dolphin.yaml --input_path ./demo/element_imgs/para_1.jpg --element_type text
    

    command will extract the text content, preserving paragraph structure and formatting.

  4. Online Demo
    If you do not want to deploy locally, you can visit the Demo-Dolphin platform (link in GitHub repository). Upload an image of the document on the platform, select the element type (e.g. table, text or formula) and see the parsing results. The platform is suitable for quick testing and requires no configuration of the environment.

Featured Function Operation

  • Page Layout Analysis: Dolphin first scans the entire document image, recognizes the elements on the page (e.g., headings, paragraphs, tables, etc.), and arranges them in natural reading order. This method is suitable for processing complex documents to avoid elements being recognized in the wrong order.
  • parallel parse: Dolphin uses heterogeneous anchors to assign specific hints to different element types (e.g., tables, formulas), allowing multiple elements to be parsed at the same time and dramatically improving efficiency.
  • Lightweight Architecture: Compared to other document parsing models, Dolphin's model is smaller in size and faster in reasoning, making it suitable for running on resource-limited devices.

caveat

  • Ensure that the input image is clear; blurred or low-resolution images may affect the resolution.
  • For large documents, it is recommended to process them in chunks and upload images page by page to improve accuracy.
  • If you encounter model loading problems, check the ./checkpoints The model files in the folder are complete.
  • Refer to the README file in the GitHub repository for the latest configuration instructions and FAQs.

 

application scenario

  1. Academic Research Documentation
    Researchers can use Dolphin to parse scanned academic papers, extracting formulas, tables, and text content. For example, convert papers in PDF format into structured data for further analysis or archiving.
  2. Digitization of enterprise documents
    Organizations can use Dolphin to convert scanned paper contracts, reports or invoices into editable digital formats. Automatic extraction of forms and text dramatically improves data entry efficiency.
  3. Organization of educational resources
    Teachers and students can use Dolphin to parse formulas and diagrams in instructional materials. For example, convert scanned pages of a math textbook into LaTeX format for online teaching or note-taking.
  4. file management
    Archivists can use Dolphin to process scanned historical documents, extract key information and generate structured data for easy archiving and retrieval.

 

QA

  1. What types of document elements does Dolphin support?
    Dolphin supports parsing of text, tables, formulas and images. It can handle images of complex documents containing these elements, such as scanned PDF files or photos.
  2. How to improve parsing accuracy?
    Use high-resolution, clear images as input. Make sure that the background of the document is simple and avoid too many distracting elements. For large documents, page-by-page processing is recommended.
  3. Is Dolphin free?
    Yes, Dolphin is an open source tool, released under the MIT license. Users are free to download the code and models and use or modify them freely.
  4. Need strong hardware support?
    Dolphin's lightweight architecture makes it suitable for running on regular computers, but a GPU is recommended to accelerate reasoning. The minimum configuration is 8GB of RAM and 4GB of video memory.
  5. How do I get the latest updates?
    Follow the GitHub repository (https://github.com/bytedance/Dolphin) or the Hugging Face page for the latest code, model, and documentation updates.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish