

docext
Like
docext is a powerful tool for extracting structured information from documents such as invoices, passports, and other forms. It leverages vision-language models (VLMs) to accurately identify and extract both field data and tabular information from document images.
Cost / License
- Free
- Open Source
Platforms
- Self-Hosted
- Docker
- Python
Features
- OCR
- Structured data
- PDF OCR
- REST API
- Python-based
- On-premises software
Tags
- table-extraction
- data-extraction
- document analysis
- Machine Learning
docext News & Activities
Highlights All activities
Recent activities
POX added docext as alternative to Docparser, ExtractTable.com, ABBYY FlexiCapture and PDF Tables- POX added Structured data as a feature to docext
- POX added docext
docext information
No comments or reviews, maybe you want to be first?
Post comment/reviewWhat is docext?
docext is a powerful tool for extracting structured information from documents such as invoices, passports, and other forms. It leverages vision-language models (VLMs) to accurately identify and extract both field data and tabular information from document images.
Features:
- User-friendly interface: Built with Gradio for easy document processing
- Flexible extraction: Define custom fields or use pre-built templates
- Table extraction: Extract structured tabular data from documents
- Confidence scoring: Get confidence levels for extracted information
- On-premises deployment: Run entirely on your own infrastructure
- Multi-page support: Process documents with multiple pages
- REST API: Programmatic access for integration with your applications
- Pre-built templates: Ready-to-use templates for common document types:
- Invoices
- Passports
- Add/delete new fields/columns for other templates.





