Skip to content

Latest commit

 

History

History
138 lines (110 loc) · 3.89 KB

File metadata and controls

138 lines (110 loc) · 3.89 KB

Anthropic Messages-Compatible API Reference

The gateway exposes an Anthropic-compatible facade that points any Claude SDK at your open-source model. The translation layer handles:

  • system (string or content blocks)
  • messages (string or text content blocks — tool_result text is flattened)
  • non-streaming requests
  • streaming requests (full Anthropic event sequence, translated from OpenAI SSE)
  • /v1/messages/count_tokens (heuristic when the backend lacks a tokenizer endpoint)

Base URL matches the gateway, e.g. http://127.0.0.1:8000.

Auth: x-api-key: $API_KEY (Anthropic style) or Authorization: Bearer $API_KEY. Include anthropic-version: 2023-06-01 — the gateway echoes it back.

POST /v1/messages

Non-streaming

curl http://127.0.0.1:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "max_tokens": 256,
    "system": "You are concise.",
    "messages": [
      {"role": "user", "content": "Explain GPU inference in one sentence."}
    ]
  }'

Response:

{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "content": [{"type": "text", "text": "..."}],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 24, "output_tokens": 12}
}

Streaming

Set "stream": true. The gateway emits the canonical Anthropic event sequence, translated from the backend's OpenAI SSE:

event: message_start
event: content_block_start
event: content_block_delta   (one per chunk)
event: content_block_stop
event: message_delta         (with stop_reason, output_tokens)
event: message_stop
curl -N http://127.0.0.1:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "max_tokens": 128,
    "stream": true,
    "messages": [{"role": "user", "content": "Say hi, then stop."}]
  }'

POST /v1/messages/count_tokens

Returns a heuristic input-token count (roughly 4 characters per token). The backend's real tokenizer is not reachable through the OpenAI HTTP surface, so treat the value as an estimate.

curl http://127.0.0.1:8000/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -d '{
    "messages": [{"role": "user", "content": "hello"}]
  }'

Python (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    base_url="http://127.0.0.1:8000",
    api_key="YOUR_API_KEY",
)

msg = client.messages.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    max_tokens=256,
    messages=[{"role": "user", "content": "Give me a 2-line haiku."}],
)
print(msg.content[0].text)

Translation behaviour

  • system (string or content blocks) → OpenAI system message
  • message content (string or text blocks) → flattened to string
  • stop_sequences → OpenAI stop
  • temperature, top_p, max_tokens forwarded as-is
  • finish_reason mapping: stop → end_turn, length → max_tokens, tool_calls → tool_use

What is NOT translated

The facade is intentionally narrow. If you need these, send them directly to /v1/chat/completions against the backend, which forwards OpenAI tool/vision fields unchanged:

  • Anthropic tool_use blocks (OpenAI-style tools work on /v1/chat/completions)
  • Anthropic vision/image blocks
  • Anthropic PDF/document blocks
  • Anthropic prompt caching directives

Use the Anthropic facade when your codebase is already built around anthropic.Anthropic and you want the same SDK to target your OSS model.

Errors

  • 400 if model is missing or body is not JSON
  • 401 if the API key is missing or invalid
  • 501 if the selected backend doesn't support chat or streaming
  • backend errors pass through with the original status and body