Contract-first translation services for eDiscovery, review, and structured document-processing pipelines.
EDC Translation accepts raw text or DocumentBundle v1 JSON and emits TranslationBundle v1 JSON. It keeps source span identity, provider metadata, quality fields, custody references, and review hooks attached to translated output so downstream systems can validate and audit the result.
The repository includes a deterministic CI provider, passthrough provider, optional local CT2 adapters, optional local OpenAI-compatible runtime adapters, optional OpenRouter/Gemini adapters, FastAPI service, CLI tools, MCP-style CLI/HTTP tools, batch text translation, custody/evidence surfaces, Helm charts, GitOps scaffolding, Ansible deployment templates, and a static public presentation site.
Windows PowerShell:
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
edc-translation submit-text "Hello world." --source en --target fr --provider deterministic_ci
uvicorn edc_translation.api:app --host 127.0.0.1 --port 8080Linux/macOS:
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
edc-translation submit-text "Hello world." --source en --target fr --provider deterministic_ci
uvicorn edc_translation.api:app --host 127.0.0.1 --port 8080Open:
http://127.0.0.1:8080/healthzhttp://127.0.0.1:8080/readyzhttp://127.0.0.1:8080/docshttp://127.0.0.1:8080/admin
Docker users can run the local smoke stack:
docker compose -f docker-compose.local.yml up --buildflowchart LR
Client["CLI, REST, Python, MCP, or admin UI"]
Contracts["DocumentBundle v1 / TranslationBundle v1"]
Service["Translation service layer"]
Routing["Tenant policy and provider routing"]
Providers["Deterministic, CT2, local LLM, optional cloud"]
Stores["Local, file, Postgres, Kafka"]
Evidence["Quality, custody, review, evidence"]
Client --> Contracts --> Service --> Routing --> Providers --> Evidence
Service --> Stores
Stores --> Evidence
Evidence --> Client
| Capability | Included |
|---|---|
| Contract validation | Public JSON Schemas for document and translation bundles. |
| Deterministic local path | Credential-free provider for tests, docs, and CI. |
| Provider control plane | Explicit provider IDs, auto-route diagnostics, license gates, live-smoke gates. |
| Runtime surfaces | REST API, CLI, Python client, MCP-style CLI/HTTP wrapper, static admin UI. |
| Batch workflows | Recursive text-file translation with logs, manifests, and optional sidecar bundles. |
| Evidence surfaces | Quality, custody, review, model validation, and release-readiness metadata. |
| Deployment scaffolding | Python, Docker, Compose, Helm, GitOps, Ansible. |
- Not an OCR extraction engine.
- Not a hosted SaaS.
- Not a model-weight distribution repo.
- Not a blind proxy to live providers.
- Not legal advice or automatic certification.
| Start here | Purpose |
|---|---|
| Install | Python, optional extras, Docker, and Compose install paths. |
| Development | Local development, tests, lint, package, docs, and release hygiene. |
| Architecture | System design, trust boundaries, and deployment shape. |
| Docs index | Full public documentation suite. |
| API reference | REST, CLI, MCP-style tools, and route scopes. |
| Contracts reference | DocumentBundle v1 and TranslationBundle v1 field guidance. |
| Provider operations | Local model, CT2, auto-route, and live-provider controls. |
| Deployment | Compose, Helm, GitOps, Ansible, auth, stores, and rollout checks. |
| Wiki source | GitHub-wiki-ready pages maintained in-tree. |
| Presentation | Static public microsite and slide deck. |
| Provider | Best use |
|---|---|
deterministic_ci |
Public examples, CI, integration smoke. |
passthrough |
Same-language plumbing checks. |
local_ct2_opus / local_ct2_nllb / local_ct2_madlad |
Operator-reviewed local CT2 model directories. |
local_openai_compat |
Local /v1/models and /v1/chat/completions runtimes. |
openrouter_llm / google_gemini |
Optional live-provider experiments behind explicit credentials and smoke opt-in. |
python -m ruff check edc_translation tests
PGCONNECT_TIMEOUT=2 python -m pytest -q
docker compose -f docker-compose.local.yml config --quiet
EDC_TRANSLATION_POSTGRES_PASSWORD=local-dev-password EDC_JWT_SECRET=local-dev-jwt-secret docker compose -f docker-compose.prod.yml config --quiet
helm lint helm/edc-translation
helm template edc-translation helm/edc-translationPowerShell users should set $env:PGCONNECT_TIMEOUT="2" before running pytest, then set $env:EDC_TRANSLATION_POSTGRES_PASSWORD="local-dev-password" and $env:EDC_JWT_SECRET="local-dev-jwt-secret" before validating the production-like Compose file.
Issues and pull requests are welcome. Start with CONTRIBUTING.md, open a GitHub Discussion for design questions, and use the PR checklist to keep generated commits free of AI/LLM co-author footers.
Issues | Discussions | Security | License
