Releases · varad-more/selfhosted-chat-api

First stable release.

A self-hosted FastAPI gateway that exposes OpenAI-compatible and
Anthropic Messages-compatible APIs in front of any open-source LLM
runtime on your own hardware.

Highlights

Any OSS LLM backend, one env var + one compose profile: vllm,
ollama, llamacpp, tgi, sglang, localai, lmstudio, or any
OpenAI-compatible URL.
Full API surface: /v1/chat/completions, /v1/completions,
/v1/embeddings, /v1/models (OpenAI) plus /v1/messages and
/v1/messages/count_tokens (Anthropic). Streaming works in both
directions — the gateway translates OpenAI SSE deltas into the
canonical Anthropic event stream.
Production hardening: structured JSON logs with request IDs,
Prometheus /metrics, /livez + /readyz + /health probes,
token-bucket rate limiting, CORS, consistent error envelopes, shared
httpx.AsyncClient with lifespan management.
Hardened container: non-root, read-only rootfs, dropped capabilities,
no-new-privileges, HEALTHCHECK.
Tests + CI: 38 pytest tests using httpx.MockTransport, ruff lint,
Docker build, and compose-profile validation across all backends.
Laptop-friendly demo: make demo boots an Ollama + tiny-model stack
with no GPU required.

Quick start

git clone https://github.com/varad-more/selfhosted-chat-api
cd selfhosted-chat-api
make demo                    # CPU-only, laptop-friendly
# or
make env-vllm && make up BACKEND=vllm   # GPU host with vLLM

Then point any OpenAI or Anthropic SDK at http://127.0.0.1:8000/v1.

Docs

README.md — overview, architecture, reproducibility matrix, peer-sharing guide
docs/BACKENDS.md — per-backend launch flags and quirks
docs/MODELS.md — curated open-source model catalog and GPU sizing
docs/API_OPENAI.md / docs/API_CLAUDE.md — endpoint reference
docs/DEPLOYMENT.md / docs/OPERATIONS.md — day-1 and day-2

License

MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Quick start

Docs

License

Uh oh!

Releases: varad-more/selfhosted-chat-api

v1.0.0 — Multi-backend LLM gateway

Highlights

Quick start

Docs

License

Uh oh!