Releases: varad-more/selfhosted-chat-api
Releases · varad-more/selfhosted-chat-api
v1.0.0 — Multi-backend LLM gateway
First stable release.
A self-hosted FastAPI gateway that exposes OpenAI-compatible and
Anthropic Messages-compatible APIs in front of any open-source LLM
runtime on your own hardware.
Highlights
- Any OSS LLM backend, one env var + one compose profile:
vllm,
ollama,llamacpp,tgi,sglang,localai,lmstudio, or any
OpenAI-compatible URL. - Full API surface:
/v1/chat/completions,/v1/completions,
/v1/embeddings,/v1/models(OpenAI) plus/v1/messagesand
/v1/messages/count_tokens(Anthropic). Streaming works in both
directions — the gateway translates OpenAI SSE deltas into the
canonical Anthropic event stream. - Production hardening: structured JSON logs with request IDs,
Prometheus/metrics,/livez+/readyz+/healthprobes,
token-bucket rate limiting, CORS, consistent error envelopes, shared
httpx.AsyncClientwith lifespan management. - Hardened container: non-root, read-only rootfs, dropped capabilities,
no-new-privileges,HEALTHCHECK. - Tests + CI: 38 pytest tests using
httpx.MockTransport, ruff lint,
Docker build, and compose-profile validation across all backends. - Laptop-friendly demo:
make demoboots an Ollama + tiny-model stack
with no GPU required.
Quick start
git clone https://github.com/varad-more/selfhosted-chat-api
cd selfhosted-chat-api
make demo # CPU-only, laptop-friendly
# or
make env-vllm && make up BACKEND=vllm # GPU host with vLLMThen point any OpenAI or Anthropic SDK at http://127.0.0.1:8000/v1.
Docs
README.md— overview, architecture, reproducibility matrix, peer-sharing guidedocs/BACKENDS.md— per-backend launch flags and quirksdocs/MODELS.md— curated open-source model catalog and GPU sizingdocs/API_OPENAI.md/docs/API_CLAUDE.md— endpoint referencedocs/DEPLOYMENT.md/docs/OPERATIONS.md— day-1 and day-2
License
MIT.