Skip to content

redstone-md/graft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

38 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Graft

Multi-model pipeline orchestrator with OpenAI-compatible API

English ะ ัƒััะบะธะน ็ฎ€ไฝ“ไธญๆ–‡

CI Release License Go Reference


Graft is an orchestrator that runs your prompt through multiple LLMs in parallel, then a judge model cross-compares their answers and produces a structured analysis, and a final model synthesizes the best possible response.

This is not "pick the longest answer." It's analyze + merge.

How it works

User โ†’ POST /v1/chat/completions
    โ†“
[Panel] โ”€โ”€โ†’ DeepSeek V4 โ”€โ”€โ†’ answer1 โ”€โ”
        โ”€โ”€โ†’ Gemini Flash โ”€โ”€โ†’ answer2 โ”€โ”ค
        โ”€โ”€โ†’ Kimi K2.6    โ”€โ”€โ†’ answer3 โ”€โ”˜
                                   โ†“
                     [Judge] โ”€โ”€โ†’ JSON analysis:
                                   โ€ข evaluations (per-answer quality)
                                   โ€ข consensus (shared points)
                                   โ€ข contradictions (disagreements + who's right)
                                   โ€ข blind_spots (what's missing)
                                   โ€ข recommendation (merge strategy)
                                   โ†“
                     [Final] โ”€โ”€โ†’ final answer

Stage 1: Panel (parallel)

N models receive the full conversation history and answer independently. No cross-contamination between models.

Stage 2: Judge (comparison)

The judge model receives all panel answers and evaluates each on:

  • Factual correctness
  • Coverage completeness
  • Reasoning depth

Then builds cross-comparison: where they agree, where they contradict (and who's right), which insights are unique, which topics nobody covered.

Stage 3: Final (synthesis)

The synthesizer takes the judge's analysis and writes a single answer that:

  • Takes the best from each panel answer
  • Resolves contradictions per the judge's verdict
  • Covers blind spots from its own knowledge
  • Excludes errors flagged by the judge

Example

Question: "Car wash is 100m from home โ€” should I drive or walk?"

Panel:

  • DeepSeek: "Walk. 100m is a minute, cars waste fuel..."
  • Gemini: "Walk. Parking, starting the engine..."
  • Kimi: "Drive. The point of a car wash is to wash your car โ€” it needs to be there."

Judge:

  • Contradiction: 2 vs 1. Verdict: "Kimi is right โ€” you need to arrive with the car, otherwise there's nothing to wash."
  • Blind spot: "The question implies the car is already at home โ€” not explicitly stated."

Final: "Drive. A car wash is for washing your vehicle โ€” walking there means arriving without your car. 100m is negligible distance, fuel consumption is insignificant."

Quick Start

# Download binary
# https://github.com/redstone-md/graft/releases

# Or build from source
git clone https://github.com/redstone-md/graft.git
cd graft
go build -o graft ./cmd/graft/

# Configure
cp config.example.yaml config.yaml
# edit config.yaml โ€” set auth_token and api_key

# Run
./graft -config config.yaml

Detailed setup guide: docs/SETUP.en.md

Usage

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="your-token")

# Simple request
response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Full conversation (agentic)
response = client.chat.completions.create(
    model="premium",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Rust?"},
        {"role": "assistant", "content": "Rust is a systems programming language..."},
        {"role": "user", "content": "How does ownership work?"},
    ]
)

# Streaming
stream = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="default",
    base_url="http://localhost:8080/v1",
    api_key="your-token",
)

curl

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{
    "model": "default",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

SSE Events (streaming)

data: {"type":"stage","stage":"panel"}
data: {"type":"content","model":"deepseek-v4","content":"..."}
data: {"type":"content","model":"gemini-flash","content":"..."}
data: {"type":"ping"}                                    โ† keepalive every 15s
data: {"type":"stage","stage":"judge"}
data: {"type":"content","model":"claude-opus","content":"..."}
data: {"type":"stage","stage":"final"}
data: {"type":"content","model":"claude-opus","content":"..."}
data: {"type":"result","data":{...}}
data: {"type":"done"}

Endpoints

Method Path Auth Description
POST /v1/chat/completions Bearer OpenAI-compatible chat completion
GET /v1/models Bearer List profiles and models
GET /health โ€” Health check

Configuration

server:
  port: "8080"
  auth_token: "your-token"

providers:
  openrouter:
    base_url: "https://openrouter.ai/api/v1"
    api_key: "sk-or-v1-..."

models:
  deepseek-v4:
    provider: openrouter
    model: "deepseek/deepseek-v4-pro"
    context_window: 131072

profiles:
  default:
    panel: [deepseek-v4, gemini-flash, kimi]
    judge: claude-opus
    final: claude-opus

Full configuration guide: docs/SETUP.en.md

Project Ideas

Graft is a building block for agentic systems. Here's what you can build:

  • Multi-perspective agent โ€” every response is verified by multiple models before delivery
  • Code review bot โ€” panel of coding models analyzes PRs in parallel, judge finds conflicts
  • Research assistant โ€” automatic search + synthesis from multiple sources with credibility scoring
  • Decision support โ€” for critical decisions where errors are costly (legal, medical, financial)
  • Multi-model fallback โ€” if one model fails, others continue
  • Custom pipelines โ€” create profiles for specific tasks: cheap for simple questions, premium for complex

Context and Limits

Graft automatically calculates the effective pipeline context:

effective_context = min(context_window) of all models in pipeline

If you have deepseek-v4 (128K) and gemini-flash (1M), context will be 128K โ€” because deepseek can't handle more. Old messages are trimmed automatically.

More details: docs/SETUP.en.md

CI/CD

Every push/PR runs go vet + go build. On v* tags, a release is automatically created with binaries for Linux, macOS, and Windows.

git tag v1.0.0
git push --tags
# โ†’ GitHub Actions builds binaries and creates a Release

License

MIT

About

Multi-model pipeline orchestrator with OpenAI-compatible API. Graft runs your prompt through a panel of models in parallel, has a judge cross-compare their answers, and synthesizes the best possible final response.

Resources

License

Stars

Watchers

Forks

Contributors

Languages