dots.ocr is a powerful multilingual document parsing tool, based on a 1.7B-parameter visual-linguistic model (VLM), capable of both layout detection and content recognition. It demonstrates state-of-the-art performance in benchmarks such as OmniDocBench, especially in text, table and reading order parsing...
Snippai is an AI-based screenshot tool designed to enhance the screenshot experience through advanced AI algorithms. It not only captures screen content, but also intelligently analyzes and converts formulas, text, tables, images and more in the screenshot. Users can use Snippai to transform complex visual information into programmable...
AI Fast Station is a free open source OCR model arena that focuses on intelligent parsing of documents and images. Users can upload PDF or image files and quickly find a suitable parsing solution by comparing the seven mainstream OCR models with one click. The site supports a variety of format files, easy to operate, without complex installation.AI Fast Station...
Docstrange is an open source document processing tool that focuses on extracting data from documents and images in multiple formats and converting them to formats such as Markdown, JSON, CSV or HTML. It utilizes artificial intelligence and advanced OCR technology , support for processing PDF, Word documents, Exce...
Guava Intelligent Document Recognition (intelligent_document_recognition) is open source desktop software developed by developer jiangnanboy , hosted on GitHub , focusing on intelligent recognition of offline processing documents and forms . The software integrates Optical Character Recognition (OCR) and form junction...
OCRFlux is an open source lightweight tool focused on converting PDF files and images to clear Markdown format. It is developed by the ChatDOC team, built on a large multimodal model with 3B parameters, and can run on common hardware such as GTX 3090. The tool specializes in complex document layouts,...
Versatile OCR Program is an open source Optical Character Recognition (OCR) tool designed specifically for processing complex academic and educational documents. It can extract text, tables, mathematical formulas, diagrams and schematics from PDF, images and other documents and generate structured data suitable for machine learning training. Support...
It automatically analyzes the layout of PDF documents, identifies text, titles, images, tables, formulas and other elements in the page, and determines their correct order. The tool supports OCR functionality , you can convert scanned PDF to searchable text . It runs on Docker and provides two models: visual model (Vision Grid ...
RolmOCR is an open source Optical Character Recognition (OCR) tool developed by Reducto AI team, based on Qwen2.5-VL-7B visual language model. It can extract text from images and PDF files faster than similar tools olmOCR, lower memory footprint.RolmOCR...
uniOCR is an open source text recognition tool developed by the mediar-ai team. It is based on the Rust language and supports macOS, Windows and Linux. It supports macOS, Windows and Linux systems. It allows users to extract text from images, and is easy and free to use. uniOCR's core feature is cross-platform support...
PDF Craft is an open source tool designed for scanning PDFs of books and converting them to Markdown format. It is developed by oomol-lab and hosted on GitHub for users who like to organize their eBooks. The tool runs through a local AI model without the need for an Internet connection, which protects privacy and facilitates operation. ....
SmolDocling is a Visual Language Model (VLM) developed by ds4sd team in collaboration with IBM, based on SmolVLM-256M, hosted on Hugging Face platform. SmolDocling is a visual language model (VLM) based on SmolVLM-256M, hosted on the Hugging Face platform, which is the world's smallest VLM with only 256M parameters.
In the long history of human civilization, every leap in the way information is acquired and parsed has profoundly driven social progress. From the ancient hieroglyphics, to the portable papyrus, to the later emergence of the printing press and today's wave of digitization, each technological innovation has greatly expanded the transmission of human knowledge...
Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes the state-of-the-art visual language model provided by the Ollama platform to extract text from images. The project is available both as a Python package and provides a user-friendly Streamlit web application interface. It supports a wide range of visual models...
STranslate is a ready-to-use translation and OCR tool developed by WPF. The tool is designed to provide efficient and convenient translation and Optical Character Recognition (OCR) functionality for a wide range of languages and text types.STranslate is an open source project that is free for users to download and use, and also accepts...
VisionParser is an OCR (Optical Character Recognition) tool designed for processing receipts and invoices. With advanced generative AI technology, VisionParser is able to quickly and accurately convert all kinds of receipts and invoices into structured data for a wide range of business scenarios, such as retail, food and beverage, and B2B services....
Chunkr is a self-hosted API specialized in converting PDF, PPTX, DOCX, and Excel files into data suitable for use in RAG (Retrieval Augmented Generation) and LLM (Large Language Modeling). It was developed by Lumina AI Inc. and utilizes advanced visual models for document...
Llama OCR is an OCR (Optical Character Recognition) library based on Llama 3.2 Vision that converts documents to Markdown format. The library was developed by Nutlope and uses the free Llama 3.2 interface provided by Together AI for graph...
Docling is a powerful document parsing and exporting tool that supports a wide range of document formats including PDF, DOCX, PPTX, XLSX, Image, HTML, AsciiDoc, and Markdown.It parses and exports these documents to HTML, Markdown, and JSON formats....
Top