OCR PDF — Extract Text from Scanned Documents
Extract text from scanned PDFs and image-based documents using optical character recognition. Supports English, French, German, Spanish and more. Everything runs in your browser — no server, no upload.
What is OCR for PDF?
OCR (Optical Character Recognition) converts scanned images and image-based PDFs into machine-readable text. This tool uses Tesseract.js — a battle-tested open-source OCR engine — to scan each page and extract text you can copy, search, or embed back into the PDF.
How to Extract Text from a Scanned PDF
- Upload your scanned PDF or image-based document
- Select the document language for better accuracy
- Choose quality: Fast for speed, Best for accuracy
- Click Run OCR — text extraction begins immediately
- Copy individual pages or all text at once
- Optionally save a searchable PDF with an invisible text layer
FAQ
Which languages does OCR support?
English, French, German, Spanish, Italian, Portuguese, Chinese (Simplified), Japanese, Arabic, and Hindi.
What does "Save Searchable PDF" do?
It embeds an invisible text layer into your original PDF. The file looks exactly the same but you can now Ctrl+F to search and select text in any PDF reader.
Is my document sent to any server?
No. Tesseract.js runs entirely in your browser. Your document never leaves your device.