pdfsandwich: OCR PDFs containing images

pdfsandwich is a handy tool developed by Tobias Elze for OCR’ing (via tesseract) scanned documents. Recognized text is added as an background layer, making it possible to search and index scanned documents.

pdfsandwich -lang eng+deu scanned.pdf