routatic-proxy exposes an Anthropic-compatible API. Claude Code connects to it as if it were the Anthropic API.
The primary endpoint. Accepts Anthropic Messages API requests and returns responses in the same format.
Request body — standard Anthropic MessageRequest:
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 4096,
"system": "You are a helpful assistant.",
"messages": [
{
"role": "user",
"content": "Hello, world!"
}
],
"stream": true,
"tools": []
}Response — Anthropic MessageResponse (non-streaming) or SSE stream (streaming).
Routing behavior:
- If
modelmatches an entry inmodel_overrides, that model is used as primary with a scenario-derived safety net - Otherwise, scenario-based routing selects the model based on request content and token count
- Set
respect_requested_model: falsein config to force scenario routing regardless of themodelfield
Headers:
| Header | Value |
|---|---|
X-Request-ID |
Unique request identifier (generated or forwarded from client) |
Content-Type |
application/json (non-streaming) or text/event-stream (streaming) |
Counts tokens for a message array without generating a response.
Request body:
{
"system": "System prompt text",
"messages": [
{ "role": "user", "content": "Hello" }
]
}Response:
{
"input_tokens": 42
}Returns server health status.
Response:
{
"status": "ok",
"version": "1.2.3",
"models_configured": 6,
"uptime": "2h30m"
}Returns compact status for TUI integration (statusline, tmux bar).
Response:
{
"status": "running",
"version": "1.2.3",
"uptime": "2h30m"
}Errors follow Anthropic's error format:
{
"type": "error",
"error": {
"type": "api_error",
"message": "description of what went wrong"
}
}HTTP status codes:
| Code | Meaning |
|---|---|
| 400 | Invalid request body |
| 405 | Method not allowed (non-POST on /v1/messages) |
| 413 | Request body too large (>100MB) |
| 429 | Rate limited |
| 500 | Internal error (routing failed, transform error) |
| 502 | All upstream models failed |
Streaming responses use Server-Sent Events (SSE) with Anthropic's event format:
event: message_start
data: {"type":"message_start","message":{"id":"msg_...","type":"message","role":"assistant","content":[],"model":"...","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":42,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":15}}
event: message_stop
data: {"type":"message_stop"}
Heartbeat: keepalive comments (:keepalive\n\n) are sent every 3 seconds during streaming.
The proxy applies per-IP rate limiting (default: 100 requests/minute). Rate-limited requests receive HTTP 429.
Optional request deduplication (request_dedup in config) prevents processing identical concurrent requests. Deduplicated requests receive HTTP 200 with no body.