Skip to content

mattmre/EDCTRANSLATION-PUBLIC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

EDC Translation social preview

EDC Translation

Contract-first translation services for eDiscovery, review, and structured document-processing pipelines.

CI Container Scan Discussions PRs welcome License: MIT Release

EDC Translation accepts raw text or DocumentBundle v1 JSON and emits TranslationBundle v1 JSON. It keeps source span identity, provider metadata, quality fields, custody references, and review hooks attached to translated output so downstream systems can validate and audit the result.

The repository includes a deterministic CI provider, passthrough provider, optional local CT2 adapters, optional local OpenAI-compatible runtime adapters, optional OpenRouter/Gemini adapters, FastAPI service, CLI tools, MCP-style CLI/HTTP tools, batch text translation, custody/evidence surfaces, Helm charts, GitOps scaffolding, Ansible deployment templates, and a static public presentation site.

30-Second Quickstart

Windows PowerShell:

py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
edc-translation submit-text "Hello world." --source en --target fr --provider deterministic_ci
uvicorn edc_translation.api:app --host 127.0.0.1 --port 8080

Linux/macOS:

python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
edc-translation submit-text "Hello world." --source en --target fr --provider deterministic_ci
uvicorn edc_translation.api:app --host 127.0.0.1 --port 8080

Open:

  • http://127.0.0.1:8080/healthz
  • http://127.0.0.1:8080/readyz
  • http://127.0.0.1:8080/docs
  • http://127.0.0.1:8080/admin

Docker users can run the local smoke stack:

docker compose -f docker-compose.local.yml up --build

System Overview

flowchart LR
    Client["CLI, REST, Python, MCP, or admin UI"]
    Contracts["DocumentBundle v1 / TranslationBundle v1"]
    Service["Translation service layer"]
    Routing["Tenant policy and provider routing"]
    Providers["Deterministic, CT2, local LLM, optional cloud"]
    Stores["Local, file, Postgres, Kafka"]
    Evidence["Quality, custody, review, evidence"]

    Client --> Contracts --> Service --> Routing --> Providers --> Evidence
    Service --> Stores
    Stores --> Evidence
    Evidence --> Client
Loading

What It Is

Capability Included
Contract validation Public JSON Schemas for document and translation bundles.
Deterministic local path Credential-free provider for tests, docs, and CI.
Provider control plane Explicit provider IDs, auto-route diagnostics, license gates, live-smoke gates.
Runtime surfaces REST API, CLI, Python client, MCP-style CLI/HTTP wrapper, static admin UI.
Batch workflows Recursive text-file translation with logs, manifests, and optional sidecar bundles.
Evidence surfaces Quality, custody, review, model validation, and release-readiness metadata.
Deployment scaffolding Python, Docker, Compose, Helm, GitOps, Ansible.

What It Is Not

  • Not an OCR extraction engine.
  • Not a hosted SaaS.
  • Not a model-weight distribution repo.
  • Not a blind proxy to live providers.
  • Not legal advice or automatic certification.

Documentation

Start here Purpose
Install Python, optional extras, Docker, and Compose install paths.
Development Local development, tests, lint, package, docs, and release hygiene.
Architecture System design, trust boundaries, and deployment shape.
Docs index Full public documentation suite.
API reference REST, CLI, MCP-style tools, and route scopes.
Contracts reference DocumentBundle v1 and TranslationBundle v1 field guidance.
Provider operations Local model, CT2, auto-route, and live-provider controls.
Deployment Compose, Helm, GitOps, Ansible, auth, stores, and rollout checks.
Wiki source GitHub-wiki-ready pages maintained in-tree.
Presentation Static public microsite and slide deck.

Provider Quick Reference

Provider Best use
deterministic_ci Public examples, CI, integration smoke.
passthrough Same-language plumbing checks.
local_ct2_opus / local_ct2_nllb / local_ct2_madlad Operator-reviewed local CT2 model directories.
local_openai_compat Local /v1/models and /v1/chat/completions runtimes.
openrouter_llm / google_gemini Optional live-provider experiments behind explicit credentials and smoke opt-in.

Validation

python -m ruff check edc_translation tests
PGCONNECT_TIMEOUT=2 python -m pytest -q
docker compose -f docker-compose.local.yml config --quiet
EDC_TRANSLATION_POSTGRES_PASSWORD=local-dev-password EDC_JWT_SECRET=local-dev-jwt-secret docker compose -f docker-compose.prod.yml config --quiet
helm lint helm/edc-translation
helm template edc-translation helm/edc-translation

PowerShell users should set $env:PGCONNECT_TIMEOUT="2" before running pytest, then set $env:EDC_TRANSLATION_POSTGRES_PASSWORD="local-dev-password" and $env:EDC_JWT_SECRET="local-dev-jwt-secret" before validating the production-like Compose file.

Contributing

Issues and pull requests are welcome. Start with CONTRIBUTING.md, open a GitHub Discussion for design questions, and use the PR checklist to keep generated commits free of AI/LLM co-author footers.

Contributors

Contributors

Star History

Star history

Links

Issues | Discussions | Security | License

About

Public release of internal translation pipeline. The aim is to have distributed processing occur on translation requests using CPU and GPU acceleration, as well as other techniques, to allow for processing translations at scale. Extensible local AI and cloud AI options make this a versatile engine for translation enrichments.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors