
DeepSeek-OCR is an optical character recognition (OCR) tool developed and open sourced by DeepSeek-AI. It proposes a new approach called “Contextual Optical Compression”, which rethinks the role of the visual coder from the perspective of the Large Language Model (LLM). The tool does not simply recognize graphs...

dots.ocr is a powerful multilingual document parsing tool, based on a 1.7B-parameter visual-linguistic model (VLM), capable of both layout detection and content recognition. It demonstrates state-of-the-art performance in benchmarks such as OmniDocBench, and excels especially in text, table and reading order parsing....

Snippai is an AI-based screenshot tool designed to enhance the screenshot experience through advanced AI algorithms. It not only captures screen content, but also intelligently analyzes and converts formulas, text, tables, images, etc. in the screenshot. Users can use Snippai to convert complex visual information into editable formats such as LaTeX formulas...

AI Fast Station is a free open source OCR model arena that focuses on intelligent parsing of documents and images. Users can upload PDF or image files and quickly find a suitable parsing solution by comparing seven mainstream OCR models with one click. The site supports a wide range of format files, easy to operate, without the need for complex installation.AI Fast Station provides high-precision recognition, fast processing and secure...

Docstrange is an open source document processing tool that focuses on extracting data from documents and images in multiple formats and converting them to formats such as Markdown, JSON, CSV or HTML. It utilizes artificial intelligence and advanced OCR technology , support for processing PDF, Word documents, Exce...

Guava Intelligent Document Recognition (intelligent_document_recognition) is open source desktop software developed by developer jiangnanboy , hosted on GitHub , focusing on intelligent recognition of offline processing documents and forms . The software integrates Optical Character Recognition (OCR) and form junction...

OCRFlux is an open source lightweight tool focused on converting PDF files and images to clear Markdown format. It is developed by the ChatDOC team, built on a large multimodal model with 3B parameters, and can run on common hardware such as GTX 3090. The tool specializes in complex document layouts,...

Versatile OCR Program is an open source Optical Character Recognition (OCR) tool designed specifically for processing complex academic and educational documents. It can extract text, tables, mathematical formulas, diagrams and schematics from PDF, images and other documents and generate structured data suitable for machine learning training. Supports multiple languages, including English...

It automatically analyzes the layout of PDF documents, identifies text, titles, images, tables, formulas and other elements in the page, and determines their correct order. The tool supports OCR functionality , you can convert scanned PDF to searchable text. It runs on Docker and provides two models: visual model (Vision Grid Transfor...

RolmOCR is an open source Optical Character Recognition (OCR) tool developed by Reducto AI team, based on Qwen2.5-VL-7B visual language model. It can extract text from images and PDF files faster than similar tools olmOCR, lower memory footprint.RolmOCR...

uniOCR is an open source text recognition tool developed by the mediar-ai team. It is based on the Rust language and supports macOS, Windows and Linux. It supports macOS, Windows and Linux systems. It allows users to extract text from images, and is easy and free to use. uniOCR's core feature is cross-platform support...

PDF Craft is an open source tool designed for scanning PDFs of books and converting them to Markdown format. It is developed by oomol-lab and hosted on GitHub for users who like to organize their eBooks. The tool runs through a local AI model and does not require an internet connection, which protects privacy and facilitates operation. It...

SmolDocling is a Visual Language Model (VLM) developed by ds4sd team in collaboration with IBM, based on SmolVLM-256M, hosted on Hugging Face platform. SmolDocling is a visual language model (VLM) based on SmolVLM-256M, hosted on the Hugging Face platform, which is the world's smallest VLM with only 256M parameters.

在人类文明的历史长河中,每一次信息获取和解析方式的飞跃,都深刻地推动着社会进步。从远古的象形文字,到便携的纸莎草,再到后来出现的印刷术以及当今的数字化浪潮,每一次技术革新都极大地拓展了人类知识的传播范围和应用深度,进而成为了孕育新一轮创新的...

Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes the state-of-the-art visual language model provided by the Ollama platform to extract text from images. The project is available both as a Python package and provides a user-friendly Streamlit web application interface. It supports a wide range of visual models, including...

STranslate is a ready-to-use translation and OCR tool developed by WPF. The tool is designed to provide efficient and convenient translation and optical character recognition (OCR) functionality for a wide range of languages and text types.STranslate is an open source project that is free for users to download and use, and also accepts custom development...

VisionParser是一款专为处理收据和发票而设计的OCR(光学字符识别)工具。通过先进的生成式AI技术,VisionParser能够快速、准确地将各种收据和发票转换为结构化数据,适用于零售、餐饮、B2B服务等多种业务场景。其灵活的AP...

Chunkr is a self-hosted API specialized in converting PDF, PPTX, DOCX, and Excel files into data suitable for use in RAG (Retrieval Augmented Generation) and LLM (Large Language Modeling). It was developed by Lumina AI Inc. and utilizes advanced visual models for document...

Llama OCR is an OCR (Optical Character Recognition) library based on Llama 3.2 Vision that converts documents to Markdown format. The library was developed by Nutlope and uses the free Llama 3.2 interface provided by Together AI for graph...
Top

