Skip to content

xberg-io/.github

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 

Repository files navigation

Xberg

Polyglot document intelligence with a Rust core — extract text, tables, and metadata from 97+ formats, with native bindings for every major language.

Documentation Discord License

Ecosystem

  • Xberg — document intelligence: text, tables, metadata from 97+ formats with optional OCR.
  • Xberg Enterprise — managed extraction API with SDKs, dashboards, and observability.
  • crawlberg — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
  • html-to-markdown — fast, lossless HTML→Markdown engine.
  • liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
  • tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
  • alef — the polyglot binding generator that produces every per-language binding across the ecosystem.

Why Xberg

  • Truly polyglot — one Rust engine, identical results across every language binding.
  • High throughput — optimized for batch workloads and multi-GB documents.
  • Memory efficient — streaming architecture keeps memory usage predictable.
  • Flexible deployment — CLI, REST API, Docker, and MCP server.
  • MIT licensed — safe for enterprise, commercial, and closed-source use.
  • Built for RAG — native chunking, embeddings, and extensibility.

Community

Built with care in Kreuzberg, Berlin.

About

Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 97+ document formats using streaming parsers and built-in OCR. Designed for RAG pipelines, batch workloads, and production deployments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors