API Calling Best Practices
To realize efficient and stable text extraction, the following key technical points need to be focused on:
- Data preprocessing: images are recommended to be converted to grayscale and sharpened, PDF is recommended to be paged to PNG format first. base64 encoding, pay attention to add the correct MIME type header
- parameter optimization::
- Temperature is set to 0.2-0.5 to balance accuracy and smoothness.
- max_tokens adjusted according to the length of the document, the general A4 document set to 3072 enough!
- batch file: Implement an asynchronous request queue to control the number of concurrencies ≤ 4 (depending on GPU graphics memory). Sample code:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(ocr_page_with_rolm, img_base64_list))
Performance Optimization Tips: For multi-page documents, it is recommended to enable vLLM's continuous batch processing feature, which can increase throughput by 3 times. Pay attention to monitor the API response time, more than 2 seconds need to check the service load.
This answer comes from the articleRolmOCR: Document OCR Model for Recognizing Handwritten and Slanted CharactersThe