Skip to content
@xberg-io

Xberg.io

Polyglot document intelligence with a Rust core — extract structured data from 97+ formats

Xberg

Polyglot document intelligence with a Rust core — extract text, tables, and metadata from 97+ formats, with native bindings for every major language.

Documentation Discord License

Ecosystem

  • Xberg — document intelligence: text, tables, metadata from 97+ formats with optional OCR.
  • Xberg Enterprise — managed extraction API with SDKs, dashboards, and observability.
  • crawlberg — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
  • html-to-markdown — fast, lossless HTML→Markdown engine.
  • liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
  • tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
  • alef — the polyglot binding generator that produces every per-language binding across the ecosystem.

Why Xberg

  • Truly polyglot — one Rust engine, identical results across every language binding.
  • High throughput — optimized for batch workloads and multi-GB documents.
  • Memory efficient — streaming architecture keeps memory usage predictable.
  • Flexible deployment — CLI, REST API, Docker, and MCP server.
  • MIT licensed — safe for enterprise, commercial, and closed-source use.
  • Built for RAG — native chunking, embeddings, and extensibility.

Community

Built with care in Kreuzberg, Berlin.

Pinned Loading

  1. xberg xberg Public

    A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Pyt…

    Rust 8.6k 504

  2. html-to-markdown html-to-markdown Public

    High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts stru…

    HTML 786 61

  3. tree-sitter-language-pack tree-sitter-language-pack Public

    Comprehensive tree-sitter grammar compilation with polyglot bindings — Rust, Python, Node.js, Go, Java, Ruby, Elixir, PHP, C#, WASM, Dart, Kotlin-Android, Swift, Zig, and CLI. 306+ languages.

    Rust 408 64

  4. liter-llm liter-llm Public

    Universal LLM API client — 142+ providers, 11 native language bindings, powered by Rust core

    Rust 218 15

  5. crawlberg crawlberg Public

    High-performance web crawling engine with bindings for 11 languages

    Rust 117 15

  6. xberg-enterprise xberg-enterprise Public

    Cloud-native document extraction platform — SaaS at kreuzberg.dev or self-host on any Kubernetes cluster. 90+ formats, REST API, webhooks. Built on Kreuzberg.

    Rust 16

Repositories

Showing 10 of 27 repositories

Top languages

Loading…

Most used topics

Loading…