OCRmyPDF supports multi-language text recognition, and you can process multi-language documents by following the steps below:
- utilization
-l
option specifies the language code, e.g.-l eng+chi_sim
Used to process PDFs containing both English and Chinese. - Install the corresponding Tesseract language pack, e.g. Chinese language pack on Linux:
sudo apt install tesseract-ocr-chi-sim
- The language code can be found in the Tesseract documentation.
OCRmyPDF supports text recognition in 39 languages, which is suitable for handling scanned documents with a mixture of multiple languages, such as mixed Chinese and English contracts or academic papers.
This answer comes from the articleOCRmyPDF: scanned PDF into searchable text of the open source toolThe