SmolDocling is a visual language model (VLM) developed by the ds4sd team in collaboration with IBM, based on SmolVLM-256M. Its core features are its small size (only 256M parameters) and high efficiency, which makes it especially suitable for running on common devices. The model is hosted on the Hugging Face platform and is the world's smallest visual language model.
Key features include:
- Text Extraction (OCR): Support for multilingual text recognition
- Layout Analysis: Automatic recognition of document structure such as headings, paragraphs, etc.
- Professional Content Processing: code blocks (in reserved format), mathematical formulas and graphical data can be extracted
- Structured Output: Generate standardized DocTags format documents
- High Resolution Support: Optimize the handling of large image sizes
Unlike other general-purpose vision models, SmolDocling is optimized for document conversion tasks, and is especially suited for academic research, programming document processing, and other applications that require accurate parsing of complex typesets.
This answer comes from the articleSmolDocling: a visual language model for efficient document processing in a small volumeThe































 English
English				 简体中文
简体中文					           日本語
日本語					           Deutsch
Deutsch					           Português do Brasil
Português do Brasil