

Tesseract
Tesseract.js is a javascript library that gets words in almost any language out of images.
Features
- OCR
Tags
- Drag selection
- text-recognition
- ocr-text-reader
Tesseract News & Activities
Recent News
Recent activities
ScanToExcel added Tesseract as alternative to ScanToExcel- nu1iess3 liked Tesseract
- pi_hacker liked Tesseract
- doppelminus liked Tesseract
Ispolline added Tesseract as alternative to PDF Scanner - Doc Converter and Docs Scanner & Sign Documents
Tesseract information
Featured in Lists
A list with 809 apps by AmileyaRyver without a description.
What a adobe creative cloud FOSS alternative(including Discontinued Apps and linux)? Well there is not a full suite …
What is Tesseract?
Tesseract.js is a javascript library that gets words in almost any language out of images.
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images. There are language files for many languages, even for text set in Fraktur and blackletter typefaces.








Comments and Reviews
In terms of OCR this tesseract is fantastic. I compared it to ABBYY 14 and tesseract had fewer errors on dictionary words. While it doesn't offer layout preservation with the OCR (i.e. converting into an editable document that should print similarly) you'll likely make up for that in the reduced time needed to fix OCR errors.
For handling PDFs you'll need to convert them to an image file, first - pdftopng (an Open Source tool that can be found in the Xpdf project)
Requres Java be installed On Windows, this also appears to be Command Line now, no console as shown. Links in new version program folder are old and redirect. Readme didn't work. Confusing. Try this before you downvote.