Multi-model pipeline orchestrator with OpenAI-compatible API
Graft is an orchestrator that runs your prompt through multiple LLMs in parallel, then a judge model cross-compares their answers and produces a structured analysis, and a final model synthesizes the best possible response.
This is not "pick the longest answer." It's analyze + merge.
User โ POST /v1/chat/completions
โ
[Panel] โโโ DeepSeek V4 โโโ answer1 โโ
โโโ Gemini Flash โโโ answer2 โโค
โโโ Kimi K2.6 โโโ answer3 โโ
โ
[Judge] โโโ JSON analysis:
โข evaluations (per-answer quality)
โข consensus (shared points)
โข contradictions (disagreements + who's right)
โข blind_spots (what's missing)
โข recommendation (merge strategy)
โ
[Final] โโโ final answer
N models receive the full conversation history and answer independently. No cross-contamination between models.
The judge model receives all panel answers and evaluates each on:
- Factual correctness
- Coverage completeness
- Reasoning depth
Then builds cross-comparison: where they agree, where they contradict (and who's right), which insights are unique, which topics nobody covered.
The synthesizer takes the judge's analysis and writes a single answer that:
- Takes the best from each panel answer
- Resolves contradictions per the judge's verdict
- Covers blind spots from its own knowledge
- Excludes errors flagged by the judge
Question: "Car wash is 100m from home โ should I drive or walk?"
Panel:
- DeepSeek: "Walk. 100m is a minute, cars waste fuel..."
- Gemini: "Walk. Parking, starting the engine..."
- Kimi: "Drive. The point of a car wash is to wash your car โ it needs to be there."
Judge:
- Contradiction: 2 vs 1. Verdict: "Kimi is right โ you need to arrive with the car, otherwise there's nothing to wash."
- Blind spot: "The question implies the car is already at home โ not explicitly stated."
Final: "Drive. A car wash is for washing your vehicle โ walking there means arriving without your car. 100m is negligible distance, fuel consumption is insignificant."
# Download binary
# https://github.com/redstone-md/graft/releases
# Or build from source
git clone https://github.com/redstone-md/graft.git
cd graft
go build -o graft ./cmd/graft/
# Configure
cp config.example.yaml config.yaml
# edit config.yaml โ set auth_token and api_key
# Run
./graft -config config.yamlDetailed setup guide: docs/SETUP.en.md
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="your-token")
# Simple request
response = client.chat.completions.create(
model="default",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Full conversation (agentic)
response = client.chat.completions.create(
model="premium",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Rust?"},
{"role": "assistant", "content": "Rust is a systems programming language..."},
{"role": "user", "content": "How does ownership work?"},
]
)
# Streaming
stream = client.chat.completions.create(
model="default",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="default",
base_url="http://localhost:8080/v1",
api_key="your-token",
)curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-token" \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "Hello!"}]
}'data: {"type":"stage","stage":"panel"}
data: {"type":"content","model":"deepseek-v4","content":"..."}
data: {"type":"content","model":"gemini-flash","content":"..."}
data: {"type":"ping"} โ keepalive every 15s
data: {"type":"stage","stage":"judge"}
data: {"type":"content","model":"claude-opus","content":"..."}
data: {"type":"stage","stage":"final"}
data: {"type":"content","model":"claude-opus","content":"..."}
data: {"type":"result","data":{...}}
data: {"type":"done"}
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/chat/completions |
Bearer | OpenAI-compatible chat completion |
GET |
/v1/models |
Bearer | List profiles and models |
GET |
/health |
โ | Health check |
server:
port: "8080"
auth_token: "your-token"
providers:
openrouter:
base_url: "https://openrouter.ai/api/v1"
api_key: "sk-or-v1-..."
models:
deepseek-v4:
provider: openrouter
model: "deepseek/deepseek-v4-pro"
context_window: 131072
profiles:
default:
panel: [deepseek-v4, gemini-flash, kimi]
judge: claude-opus
final: claude-opusFull configuration guide: docs/SETUP.en.md
Graft is a building block for agentic systems. Here's what you can build:
- Multi-perspective agent โ every response is verified by multiple models before delivery
- Code review bot โ panel of coding models analyzes PRs in parallel, judge finds conflicts
- Research assistant โ automatic search + synthesis from multiple sources with credibility scoring
- Decision support โ for critical decisions where errors are costly (legal, medical, financial)
- Multi-model fallback โ if one model fails, others continue
- Custom pipelines โ create profiles for specific tasks: cheap for simple questions, premium for complex
Graft automatically calculates the effective pipeline context:
effective_context = min(context_window) of all models in pipeline
If you have deepseek-v4 (128K) and gemini-flash (1M), context will be 128K โ because deepseek can't handle more. Old messages are trimmed automatically.
More details: docs/SETUP.en.md
Every push/PR runs go vet + go build. On v* tags, a release is automatically created with binaries for Linux, macOS, and Windows.
git tag v1.0.0
git push --tags
# โ GitHub Actions builds binaries and creates a Release