In order to adapt to different application scenarios, the system provides two industrial-grade deployment options: vLLM-based program supports dynamic batch processing and pipeline parallelism, in the 8-card A100 server can be achieved on the processing throughput of 50 pages of PDF per second; HuggingFace program is more suitable for rapid prototyping, through a simplified API interface can be completed within 5 minutes to build the environment. Officials also provide Docker image package, containing the CUDA acceleration environment and pre-training weights to avoid the user to deal with complex dependencies. Enterprise users can also modify the tensor-parallel-size parameter to achieve the optimal allocation of computing resources.
This answer comes from the articledots.ocr: a unified visual-linguistic model for multilingual document layout parsingThe