Anthropic Messages-Compatible API Reference

The gateway exposes an Anthropic-compatible facade that points any Claude SDK at your open-source model. The translation layer handles:

system (string or content blocks)
messages (string or text content blocks — tool_result text is flattened)
non-streaming requests
streaming requests (full Anthropic event sequence, translated from OpenAI SSE)
/v1/messages/count_tokens (heuristic when the backend lacks a tokenizer endpoint)

Base URL matches the gateway, e.g. http://127.0.0.1:8000.

Auth: x-api-key: $API_KEY (Anthropic style) or Authorization: Bearer $API_KEY. Include anthropic-version: 2023-06-01 — the gateway echoes it back.

POST /v1/messages

Non-streaming

curl http://127.0.0.1:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "max_tokens": 256,
    "system": "You are concise.",
    "messages": [
      {"role": "user", "content": "Explain GPU inference in one sentence."}
    ]
  }'

Response:

{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "content": [{"type": "text", "text": "..."}],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 24, "output_tokens": 12}
}

Streaming

Set "stream": true. The gateway emits the canonical Anthropic event sequence, translated from the backend's OpenAI SSE:

event: message_start
event: content_block_start
event: content_block_delta   (one per chunk)
event: content_block_stop
event: message_delta         (with stop_reason, output_tokens)
event: message_stop

curl -N http://127.0.0.1:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "max_tokens": 128,
    "stream": true,
    "messages": [{"role": "user", "content": "Say hi, then stop."}]
  }'

POST /v1/messages/count_tokens

Returns a heuristic input-token count (roughly 4 characters per token). The backend's real tokenizer is not reachable through the OpenAI HTTP surface, so treat the value as an estimate.

curl http://127.0.0.1:8000/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -d '{
    "messages": [{"role": "user", "content": "hello"}]
  }'

Python (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    base_url="http://127.0.0.1:8000",
    api_key="YOUR_API_KEY",
)

msg = client.messages.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    max_tokens=256,
    messages=[{"role": "user", "content": "Give me a 2-line haiku."}],
)
print(msg.content[0].text)

Translation behaviour

system (string or content blocks) → OpenAI system message
message content (string or text blocks) → flattened to string
stop_sequences → OpenAI stop
temperature, top_p, max_tokens forwarded as-is
finish_reason mapping: stop → end_turn, length → max_tokens, tool_calls → tool_use

What is NOT translated

The facade is intentionally narrow. If you need these, send them directly to /v1/chat/completions against the backend, which forwards OpenAI tool/vision fields unchanged:

Anthropic tool_use blocks (OpenAI-style tools work on /v1/chat/completions)
Anthropic vision/image blocks
Anthropic PDF/document blocks
Anthropic prompt caching directives

Use the Anthropic facade when your codebase is already built around anthropic.Anthropic and you want the same SDK to target your OSS model.

Errors

400 if model is missing or body is not JSON
401 if the API key is missing or invalid
501 if the selected backend doesn't support chat or streaming
backend errors pass through with the original status and body

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic Messages-Compatible API Reference

POST /v1/messages

Non-streaming

Streaming

POST /v1/messages/count_tokens

Python (Anthropic SDK)

Translation behaviour

What is NOT translated

Errors

FilesExpand file tree

API_CLAUDE.md

Latest commit

History

API_CLAUDE.md

File metadata and controls

Anthropic Messages-Compatible API Reference

POST /v1/messages

Non-streaming

Streaming

POST /v1/messages/count_tokens

Python (Anthropic SDK)

Translation behaviour

What is NOT translated

Errors