Current Position:fig. beginning " AI Answers

SmolDocling is the world's smallest visual language model

2025-08-28

1.7 K

As the world's smallest visual language model (VLM) in terms of parameter size, SmolDocling has only 256M parameters, which was jointly developed by the ds4sd team and IBM. Built on the lean architecture SmolVLM-256M, it features efficient document processing capabilities while maintaining a tiny size. Compared to traditional large-scale VLMs that usually require billions of parameters, SmolDocling is specially optimized with model compression techniques that enable it to run smoothly on common computing devices. The nature of open source hosting on the Hugging Face platform further lowers the barrier to using the technology.

The miniaturized design of the model has multiple advantages: it reduces the video memory occupation by more than 70%, improves the inference speed by more than 10 times, and supports operation in GPU-less environments. Experimental data shows that the document recognition accuracy of 88.7% can still be maintained under 256M parameter scale, which is particularly suitable for embedded devices and edge computing scenarios. This miniaturized implementation path represents an important breakthrough in the development of VLM technology towards lightweight and civilianization.

This answer comes from the articleSmolDocling: a visual language model for efficient document processing in a small volumeThe

May not be reproduced without permission:AI productivity tools " SmolDocling is the world's smallest visual language model