Polyglot document intelligence with a Rust core — extract text, tables, and metadata from 97+ formats, with native bindings for every major language.
- Xberg — document intelligence: text, tables, metadata from 97+ formats with optional OCR.
- Xberg Enterprise — managed extraction API with SDKs, dashboards, and observability.
- crawlberg — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- html-to-markdown — fast, lossless HTML→Markdown engine.
- liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
- tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
- alef — the polyglot binding generator that produces every per-language binding across the ecosystem.
- Truly polyglot — one Rust engine, identical results across every language binding.
- High throughput — optimized for batch workloads and multi-GB documents.
- Memory efficient — streaming architecture keeps memory usage predictable.
- Flexible deployment — CLI, REST API, Docker, and MCP server.
- MIT licensed — safe for enterprise, commercial, and closed-source use.
- Built for RAG — native chunking, embeddings, and extensibility.
- Discord: https://discord.gg/xzx4KkAPED
- Reddit: https://www.reddit.com/r/kreuzberg_dev/
- LinkedIn: https://www.linkedin.com/company/kreuzberg-dev/
- X/Twitter: https://x.com/kreuzberg_dev
- Contact: contact@xberg.io
Built with care in Kreuzberg, Berlin.