

Kreuzberg
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from 75+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
Cost / License
- Free
- Open Source (MIT)
Platforms
- Mac
- Windows
- Linux
- Python

Kreuzberg
Features
- OCR
- Python-based
- Rust
- Extract text from image
Tags
- Ruby
- pdf-text-extractor
- retrieval-augmented-generation
- Extract text
- elixir
- typescript
- Java
- python-lib
- Python
- Metadata Extraction
- pandoc
- python-library
- Php
- Node.js
- tesseract-ocr
- tesseract
Kreuzberg News & Activities
Recent activities
Kreuzberg information
What is Kreuzberg?
Extract text and metadata from a wide range of file formats (75+), generate embeddings and post-process at native speeds without needing a GPU.
Key Features Extensible architecture – Plugin system for custom OCR backends, validators, post-processors, and document extractors Polyglot – Native bindings for Rust, Python, TypeScript/Node.js, Ruby, Go, Java, C#, PHP, and Elixir 75+ file formats – PDF, Office documents, images, HTML, XML, emails, archives, academic formats across 8 categories OCR support – Tesseract (all bindings), PaddleOCR (all native bindings), EasyOCR (Python), extensible via plugin API High performance – Rust core with native PDFium, SIMD optimizations and full parallelism Flexible deployment – Use as library, CLI tool, REST API server, or MCP server Memory efficient – Streaming parsers for multi-GB files Complete Documentation | Installation Guides
Installation Each language binding provides comprehensive documentation with examples and best practices. Choose your platform to get started: Key Features OCR with Table Extraction Batch Processing Password-Protected PDFs Language Detection Metadata Extraction AI Coding Assistants Kreuzberg ships with an Agent Skill that teaches AI coding assistants how to use the library correctly. It works with Claude Code, Codex, Gemini CLI, Cursor, VS Code, Amp, Goose, Roo Code, and any tool supporting the Agent Skills standard.
Documentation: https://docs.kreuzberg.dev/
Contributing Contributions are welcome! https://github.com/kreuzberg-dev/kreuzberg
License MIT License - see LICENSE for details. You can use Kreuzberg freely in both commercial and closed-source products with no obligations, no viral effects, and no licensing restrictions.