The steps to run OCRmyPDF with Docker are as follows:
- Make sure Docker is installed and running:
docker run hello-world
- Pull the official OCRmyPDF image:
docker pull jbarlow83/ocrmypdf
- (Optional) Marks the mirror as a convenient name:
docker tag jbarlow83/ocrmypdf ocrmypdf
- Run OCR processing:
docker run --rm -v $(pwd):/data ocrmypdf /data/input.pdf /data/output.pdf
This command will:
- Set the current directory (
$(pwd)
) is mounted to the container's/data
catalogs - Processing input.pdf files in the current directory
- Output the results to the current directory output.pdf
- Automatically deletes temporary containers after running (
--rm
Parameters)
The Docker approach is particularly suitable for scenarios where there is no local environment or where cross-platform use is required.
This answer comes from the articleOCRmyPDF: scanned PDF into searchable text of the open source toolThe