Mistral unveils high-accuracy OCR API with advanced document understanding features
Mistral AI has launched a new OCR API that surpasses existing market solutions in accuracy, according to benchmark tests. The company introduced two models, mistral-ocr-2503 and mistral-ocr-latest, designed to extract text from images and documents with advanced document understanding capabilities. These models support multiple languages, recognize printed and handwritten text, and maintain the original layout and formatting of documents. They can also extract text from tables, forms, and complex layouts.
Mistral OCR achieves a notable accuracy of 94.89%, with 99.02% on multiple languages, outperforming competitors like Google Document AI or Azure OCR. It efficiently converts complex infographics into digital formats, useful for visually dense materials. Its lightweight architecture allows processing of 2,000 pages per minute on a single computing node.
The pricing model is cost-efficient, offering 1,000 pages per dollar or 2,000 pages per dollar with batch processing. A unique feature, "doc-as-prompt," enables users to input entire documents as AI instructions for structured information extraction, outputting data in JSON format compatible with AI and data-processing applications. The API supports self-hosting for greater data security and operates on Mistral's developer platform, "la Plateforme," with future plans for cloud and inference partner support.



Comments
It's always good news to see OCR being improved, because it has always been about OCR having one day better accuracy than humans (speed put aside, of course), and in 30 years, it's still not the case. But now with LLM, and their "understanding" (i.e. computing probability) of the context, the recognition is much more coherent, and this technology less frustrating. As the definition of OCR has been widen theses last years to include text displayed in many forms (posters, code), Mitral has focused on standard black on white printed text images, and not handwriting, it explains the accuracy and performance achieved (being a model specialized on this very specific task).