Why does OCR take a while on first use?

Tesseract.js downloads a ~7 MB WebAssembly bundle on first use. Subsequent recognitions on the same page are faster because the engine stays cached.

Which languages are supported?

English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese (Simplified), Japanese, Arabic, and Hindi. Select the matching language before running OCR for best accuracy.

What image quality gives best results?

Use high-contrast scans at 300 DPI or higher. Tilted or blurry text reduces accuracy. For PDFs, the tool renders each page at 2 scale before processing.

← Back to Home

OCR - Image to Text

Extract text from images and scanned PDFs using advanced Optical Character Recognition (OCR). Available in 12 languages. Fully client-side processing guarantees your sensitive documents are completely private and never uploaded to any server.

Best for: extracting text from scanned PDFs, photos of documents, or screenshots.

Input: Images (JPG, PNG, WebP, BMP, TIFF) and PDF files.

Output: Extracted plain text you can copy or download.

Privacy

All processing is handled by your browser using Tesseract.js. No document data ever reaches ConvertPDF servers. Your files remain on your device.

Related tools

PDF to JPG
Images to PDF
TXT to DOCX