Tesseract.js is a javascript library that gets words in almost any language out of images.

The best open source alternative to OCRopus is Tesseract. If that doesn't suit you, our users have ranked more than 50 alternatives to OCRopus and nine of them is open source so hopefully you can find a suitable replacement. Other interesting open source alternatives to OCRopus are CopyFish, Chandra, MinerU and GOCR.
Tesseract.js is a javascript library that gets words in almost any language out of images.



Chandra is a highly accurate OCR model that converts images and PDFs into structured HTML/Markdown/JSON while preserving layout information.

Free all-in-one document parsing tool. Accurate parsing, efficient extraction, providing a more fluent and accurate parsing experience.




GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files. Joerg Schulenburg started the program, and now leads a team of developers.


A Java/.NET GUI frontend for Tesseract OCR engine. Provides optical character recognition for Vietnamese and other languages supported by Tesseract.


CuneiForm (OpenOCR) is a text recognition software for printed templates. Manuscripts or PDF-files, the program can not recognize, however, but table structures. The language-model is applicable for 20 languages, and the results can be used as HTML, RTF or ASCII text to save, or...


WatchOCR is an open source OCR server that creates searchable pdfs from images in a watched folder.
