dots.ocr: a unified visual-linguistic model for multilingual document layout parsing
dots.ocr is a powerful multilingual document parsing tool, based on a 1.7B-parameter visual-linguistic model (VLM), capable of both layout detection and content recognition. It demonstrates state-of-the-art performance in benchmarks such as OmniDocBench, especially in text, table and reading order parsing...
SnippAI: A tool for recognizing and analyzing screenshot content using AI
Snippai is an AI-based screenshot tool designed to enhance the screenshot experience through advanced AI algorithms. It not only captures screen content, but also intelligently analyzes and converts formulas, text, tables, images and more in the screenshot. Users can use Snippai to transform complex visual information into programmable...
AI Fast Station: document parsing tool for comparing OCR models in one click
AI Fast Station is a free open source OCR model arena that focuses on intelligent parsing of documents and images. Users can upload PDF or image files and quickly find a suitable parsing solution by comparing the seven mainstream OCR models with one click. The site supports a variety of format files, easy to operate, without complex installation.AI Fast Station...
OCRmyPDF: scanned PDF into searchable text of the open source tool
OCRmyPDF is an open source command line tool designed to add an Optical Character Recognition (OCR) text layer to scanned PDF files, turning them into searchable, reproducible documents. It is based on Python development , using the Tesseract OCR engine , can accurately recognize the text in the image and embed it into the PDF ...
Docstrange: a tool for extracting data from documents and images and converting them to multiple formats
Docstrange is an open source document processing tool that focuses on extracting data from documents and images in multiple formats and converting them to formats such as Markdown, JSON, CSV or HTML. It utilizes artificial intelligence and advanced OCR technology , support for processing PDF, Word documents, Exce...
Guava Intelligent Document Recognition: Intelligent Recognition Tool for Offline Documents and Forms
Guava Intelligent Document Recognition (intelligent_document_recognition) is open source desktop software developed by developer jiangnanboy , hosted on GitHub , focusing on intelligent recognition of offline processing documents and forms . The software integrates Optical Character Recognition (OCR) and form junction...
OCRFlux: Lightweight tool for converting PDFs and images to Markdown
OCRFlux is an open source lightweight tool focused on converting PDF files and images to clear Markdown format. It is developed by the ChatDOC team, built on a large multimodal model with 3B parameters, and can run on common hardware such as GTX 3090. The tool specializes in complex document layouts,...
VOP: OCR Tool for Extracting Complex Diagrams and Math Formulas
Versatile OCR Program is an open source Optical Character Recognition (OCR) tool designed specifically for processing complex academic and educational documents. It can extract text, tables, mathematical formulas, diagrams and schematics from PDF, images and other documents and generate structured data suitable for machine learning training. Support...
Automatically parse PDF content and extract text and tables of open source services
It automatically analyzes the layout of PDF documents, identifies text, titles, images, tables, formulas and other elements in the page, and determines their correct order. The tool supports OCR functionality , you can convert scanned PDF to searchable text . It runs on Docker and provides two models: visual model (Vision Grid ...
Bob.
Bob is a translation and OCR (Optical Character Recognition) software designed for the macOS platform. Users can use Bob for translation and OCR operations in any application, supporting a wide range of translation services, including Volcano, Tencent, Ali, Baidu, Youdao, Apple, Google, Microsoft,...
Ollama OCR: Extracting Text from Images Using Visual Models in Ollama
Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes the state-of-the-art visual language model provided by the Ollama platform to extract text from images. The project is available both as a Python package and provides a user-friendly Streamlit web application interface. It supports a wide range of visual models...
Doc2X
Doc2X is a powerful document image formula recognition and conversion tools, is committed to providing efficient and intelligent document processing solutions. Whether it is an academic research paper, textbooks, corporate documents or financial reports, Doc2X can accurately recognize the tables and formulas in PDF and convert them to Word with one click....
STranslate
STranslate is a ready-to-use translation and OCR tool developed by WPF. The tool is designed to provide efficient and convenient translation and Optical Character Recognition (OCR) functionality for a wide range of languages and text types.STranslate is an open source project that is free for users to download and use, and also accepts...
Llama OCR: OCR library that converts images to Markdown in three lines of code using the free Llama 3.2 Vision interface
Llama OCR is an OCR (Optical Character Recognition) library based on Llama 3.2 Vision that converts documents to Markdown format. The library was developed by Nutlope and uses the free Llama 3.2 interface provided by Together AI for graph...
Easydict
Easydict is a simple and elegant dictionary translation app for macOS users. With support for multiple translation services and offline OCR recognition, it makes finding words or translating text easy and elegant.Easydict comes out of the box with support for typing translations, swiping translations, and screenshot translations, and offers convenient multilingual...
Datalab: dedicated OCR recognition AI model, PDF to Markdown (open source/API)
Datalab offers a range of advanced AI models focused on OCR, layout analysis, PDF to Markdown, and more. These models are not only high performing, but also easy to use and open source. The Marker model on the platform can quickly and accurately convert PDF to Markdown, including tables and formulas.Su...
TTime
TTime, a project published on GitHub by InkTimeRecord, is a simple and efficient translation software. It mainly provides input, screenshot, stroke and hoverball translation functions, and supports a variety of translation sources and text recognition services, allowing users to quickly perform language conversion and text recognition. In addition, TT...
Top