

Tesseract
101 likes
Tesseract.js is a javascript library that gets words in almost any language out of images.
Features
Tesseract News & Activities
Highlights • All activities
Recent News
No news, maybe you know any news worth sharing?
Share a News TipRecent activities
- ver liked Tesseract
- frogue added Tesseract as alternative to Klippa DocHorizon
- tiptoptom added Tesseract as alternative to OSS Document Scanner
- sittletwopalternativeto liked Tesseract
- pankaj-from-digiparser added Tesseract as alternative to DigiParser
- pastel_p1xel_punK added Tesseract as alternative to Project Naptha
- justarandom added Tesseract as alternative to Scan Thing: Scan Anything
- POX added Tesseract as alternative to Open Scanner
- AuthoritativeProtocol liked Tesseract
Comments and Reviews
In terms of OCR this tesseract is fantastic. I compared it to ABBYY 14 and tesseract had fewer errors on dictionary words. While it doesn't offer layout preservation with the OCR (i.e. converting into an editable document that should print similarly) you'll likely make up for that in the reduced time needed to fix OCR errors.
For handling PDFs you'll need to convert them to an image file, first - pdftopng (an Open Source tool that can be found in the Xpdf project)
Requres that Java be installed