🦀 kani

LLM smart router. Classifies prompts by complexity and routes to the optimal model.

OpenAI API-compatible proxy — drop in as a base URL and let kani pick the right model automatically.

How it works

Request → Distilled Feature Classifier (15 dimensions) → Tier + Agentic Score → Capability Filter → Model Selection (round-robin) → Upstream Provider
                                                     │
                                                     └─ model unavailable → conservative default

Classification pipeline:

Distilled feature classifier — deterministic tokenCount + learned 14 semantic dimensions
Weighted synthesis — one unified score drives tier selection (SIMPLE / MEDIUM / COMPLEX / REASONING)
Unified agentic score — agenticTask dimension is mapped directly to agentic_score
Conservative default — fall back to MEDIUM when the feature model is unavailable
Capability filter — auto-detects vision/tools/json_mode from the request and escalates to a capable model

Every request is logged to $XDG_STATE_HOME/kani/log/ (default: ~/.local/state/kani/log/) as training data for future model improvement.

Scoring approach

kani no longer relies on hand-maintained keyword lists or runtime LLM fallback for routing. The scorer is now distilled-feature-first:

compute tokenCount deterministically
infer 14 semantic dimensions (low / medium / high) using a learned multi-output classifier
combine all 15 dimensions with explicit weights to determine tier and confidence
derive agentic_score from the agenticTask dimension in the same pipeline
return a conservative default tier only when the feature model is unavailable

This makes routing behavior easier to improve with data, because changes come from retraining and calibration rather than runtime prompt engineering.

Quick start

Try without installing (uvx)

# Classify a prompt
uvx --from git+https://github.com/tumf/kani kani route "hello world"

# Start the proxy server
uvx --from git+https://github.com/tumf/kani kani serve

Local install

git clone https://github.com/tumf/kani.git && cd kani
uv sync

uv run kani route "hello world"
uv run kani serve

Usage — drop-in replacement for OpenAI / OpenRouter

kani speaks the OpenAI API. Change base_url and model, everything else stays the same.

Before (direct OpenAI)

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # OpenAI key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "explain quicksort"}],
)

Before (OpenRouter)

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",  # OpenRouter
    api_key="sk-or-...",                       # OpenRouter key
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "explain quicksort"}],
)

After (kani) — auto-routed

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18420/v1",      # ← kani
    api_key="anything",                        # kani handles upstream auth
)

# kani picks the best model based on prompt complexity
response = client.chat.completions.create(
    model="kani/auto",
    messages=[{"role": "user", "content": "explain quicksort"}],
)

# Or pin a routing profile
response = client.chat.completions.create(
    model="kani/premium",  # always use best-quality models
    messages=[{"role": "user", "content": "prove P != NP"}],
)

curl

curl http://localhost:18420/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "kani/auto", "messages": [{"role": "user", "content": "hello"}]}'

That's it. Any tool or library that supports the OpenAI API works with kani — LangChain, LlamaIndex, Cursor, Continue, etc. Just point base_url at kani.

Routing profiles

Note: The routing profiles below are sample/reference defaults. Treat them as examples — you should tune the actual profile names, strategies, and model mappings to match your own workload and cost/quality goals.

Profile	Strategy	Best for
`kani/auto`	Balanced cost/quality (default)	General use
`kani/eco`	Cheapest viable models	High volume, low stakes
`kani/premium`	Best quality models	Critical tasks
`kani/agentic`	Tool-use optimized	Agent workflows

Capability-aware routing

kani automatically detects required capabilities from the request and routes to a model that supports them. If no model in the scored tier has the required capabilities, kani escalates to higher tiers.

Detected capabilities:

Capability	Trigger
`vision`	`image_url` content block in messages
`tools`	`tools` or `functions` field in request
`json_mode`	`response_format.type` is `json_object` or `json_schema`

Configuration: declare model capabilities via prefix matching in config.yaml:

model_capabilities:
  - prefix: "anthropic/claude-"
    capabilities: [vision, tools, json_mode]
  - prefix: "google/gemini-"
    capabilities: [vision, tools, json_mode]
  - prefix: "openai/gpt-4"
    capabilities: [vision, tools, json_mode]

When no model_capabilities are configured, capability filtering is skipped and routing works as before.

API endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Main proxy (OpenAI-compatible)
`/v1/models`	GET	List available models
`/v1/route`	POST	Debug — returns routing decision without proxying
`/admin/reload-config`	POST	Admin-only safe config hot reload
`/health`	GET	Health + active config version metadata

Routed responses include extra headers: X-Kani-Tier, X-Kani-Model, X-Kani-Score, X-Kani-Signals.

Configuration

config.yaml:

host: "0.0.0.0"
port: 18420
default_provider: openrouter
default_profile: auto

providers:
  openrouter:
    name: openrouter
    base_url: "https://openrouter.ai/api/v1"
    api_key: "${OPENROUTER_API_KEY}"
  cliproxy:
    name: cliproxy
    base_url: "http://127.0.0.1:8317/v1"
    api_key: "local-test-key"

profiles:
  auto:
    tiers:
      SIMPLE:
        # primary can be a single model or an ordered list for round-robin
        primary: ["google/gemini-2.5-flash", "google/gemini-2.5-flash-lite"]
        fallback: ["nvidia/gpt-oss-120b"]
      MEDIUM:
        primary: "moonshotai/kimi-k2.5"
        fallback: null  # allowed; normalized to []
      COMPLEX:
        primary: "google/gemini-3.1-pro"
        fallback: ["anthropic/claude-sonnet-4.6"]
      REASONING:
        primary: "x-ai/grok-4-1-fast-reasoning"
        fallback: ["anthropic/claude-sonnet-4.6"]
      # provider: per-tier override (optional)

smart_proxy:
  fallback_backoff:
    enabled: true
    initial_delay_seconds: 5
    multiplier: 2
    max_delay_seconds: 300

${VAR} syntax resolves environment variables
Each tier can specify its own provider or inherit default_provider
primary accepts a string, {model, provider} object, or a list of those; list entries are selected round-robin per profile+tier combination
fallback: null is accepted only at profiles.*.tiers.*.fallback and normalized to []
When primary fails, fallback attempts skip the failed primary candidate and deduplicate repeated model+provider entries
smart_proxy.fallback_backoff enables process-local exponential cooldowns for retryable non-streaming 429 / 5xx failures, keyed by model+provider
Cooled-down model+provider pairs are skipped during both primary selection and fallback execution; the same model on a different provider remains eligible
Successful recovery resets the failure streak for that exact model+provider pair, and restarting kani clears the in-memory cooldown registry
Config path: --config flag > $KANI_CONFIG env var > ./config.yaml > $XDG_CONFIG_HOME/kani/config.yaml > /etc/kani/config.yaml
Set KANI_ADMIN_TOKEN to enable POST /admin/reload-config (admin-only, separate from regular API keys)
Hot reload validates with strict=True and rejects non-reloadable field changes (host, port) with 409

Smart-proxy context compaction

kani can optionally reduce context pressure for long-running conversations by compacting oversized message histories before proxying upstream (Phase A) and by pre-computing summaries in the background for reuse on later requests (Phase B).

All compaction behavior is opt-in and disabled by default. When disabled or when compaction fails, kani routes and proxies requests unchanged.

Configuration

Add a smart_proxy section to your config.yaml:

smart_proxy:
  context_compaction:
    enabled: true                        # master switch

    sync_compaction:
      enabled: true                      # Phase A: compact inline before proxying
      threshold_percent: 80.0            # compact when prompt ≥ 80% of context window
      protect_first_n: 1                 # turns to keep at head of conversation
      protect_last_n: 2                  # turns to keep at tail
      summary_model: ""                  # empty = use 'compress' profile primary model

    background_precompaction:
      enabled: true                      # Phase B: pre-compute summaries async
      trigger_percent: 70.0              # start background job at 70% usage
      max_concurrency: 2                 # max parallel background jobs
      summary_ttl_seconds: 3600

    session:
      header_name: X-Kani-Session-Id    # client header for explicit session binding

    context_window_tokens: 128000        # assumed context window for threshold math

A compress routing profile (see config.example.yaml) is used as the default summarisation model when summary_model is empty.

Session identity

kani resolves a stable session key in this order:

Explicit header — value of session.header_name (preferred; required for Phase B cache hits)
Derived — deterministic hash of model + first/last message content

The resolution mode is surfaced in the X-Kani-Compaction-Session response header.

Operator telemetry

Each routed response includes compaction headers:

Header	Values	Meaning
`X-Kani-Compaction`	`off` \| `skipped` \| `inline` \| `cached` \| `failed`	What compaction did
`X-Kani-Compaction-Session`	`explicit` \| `derived`	How session was resolved
`X-Kani-Compaction-Saved-Tokens`	integer	Estimated tokens saved

Structured log fields are emitted at INFO level on every compaction decision. Failures are logged at WARNING level and never propagate to the client.

Safe config hot reload (admin)

Use admin-only config hot reload without restarting the proxy:

# 1) set an admin token (separate from normal API keys)
export KANI_ADMIN_TOKEN="your-admin-token"

# 2) trigger reload after editing config.yaml
curl -X POST http://localhost:18420/admin/reload-config \
  -H "Authorization: Bearer ${KANI_ADMIN_TOKEN}"

Behavior:

Reload is applied only when strict config validation succeeds.
In-flight requests keep the state snapshot captured at request start.
Changes to host / port are rejected as non-reloadable with 409 and require process restart.

Docker Compose / local deployment

No additional services are required. Compaction state is persisted in SQLite under $XDG_DATA_HOME/kani/compaction.db (default: ~/.local/share/kani/compaction.db). Override with KANI_DATA_DIR.

# Verify compaction is active after startup:
curl -s http://localhost:18420/health | jq .
# Inspect a routed request's compaction outcome:
curl -v -X POST http://localhost:18420/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Kani-Session-Id: my-session-1" \
  -d '{"model":"kani/auto","messages":[{"role":"user","content":"hello"}]}' \
  2>&1 | grep -i "x-kani-compaction"

Offline feature annotation

Runtime routing does not call an LLM. LLM usage is limited to offline dataset generation when logs are missing semantic labels.

Optional annotator configuration (for scripts/build_agentic_dataset.py --annotate-missing) can be set in config.yaml under feature_annotator, or overridden with env vars:

feature_annotator:
  model: "gemini-2.5-flash-lite"
  provider: "cliproxy"  # optional; defaults to default_provider

feature_annotator and llm_classifier connection details are provider-resolved. In config.yaml, set model + optional provider; do not set base_url or api_key directly in these sections.

Env var	Default	Description
`KANI_LLM_ANNOTATOR_MODEL`	`google/gemini-2.5-flash-lite`	Annotation model
`KANI_LLM_ANNOTATOR_BASE_URL`	`https://openrouter.ai/api/v1`	API endpoint
`KANI_LLM_ANNOTATOR_API_KEY`	`$OPENROUTER_API_KEY`	API key

Priority is: CLI flags > env vars > config.yaml feature_annotator > built-in defaults.

Routing logs

All decisions are logged to $XDG_STATE_HOME/kani/log/routing-YYYY-MM-DD.jsonl (default: ~/.local/state/kani/log/):

{"timestamp":"2025-03-21T19:50:00","prompt_preview":"prove the Riemann...","tier":"REASONING","score":0.82,"confidence":0.87,"method":"distilled-features","agentic_score":1.0,"signals":{"tokenCount":38,"semanticLabels":{"reasoningMarkers":"high","agenticTask":"high"},"featureVersion":"v1"}}

Use these logs to build distilled feature training data:

uv run python scripts/build_agentic_dataset.py \
  --output data/distilled_feature_dataset.json

When existing logs do not yet include semantic labels, you can annotate missing examples offline:

uv run python scripts/build_agentic_dataset.py \
  --annotate-missing \
  --output data/distilled_feature_dataset.json

Then train the multi-output feature classifier bundle:

uv run python scripts/train_classifier.py \
  --data data/distilled_feature_dataset.json \
  --output models

This writes models/feature_classifier.pkl with the sklearn multi-output classifier, per-dimension label encoders, weights, thresholds, and embedding metadata.

API key authentication

kani supports API key authentication to restrict proxy access. Keys are managed via the CLI and stored in $XDG_DATA_HOME/kani/api_keys.json.

When no keys are configured, all requests pass through without authentication (backward-compatible). As soon as one key is added, every API request must include a valid Authorization: Bearer <key> header.

# Create a key (auto-generated, shown once)
kani keys add hermes
#   kani-aBcDeFgH...  ← save this

# List keys (prefix only, secrets are not stored in plaintext)
kani keys list

# Remove a key by name or prefix
kani keys remove hermes

Using the key:

curl http://localhost:18420/v1/chat/completions \
  -H "Authorization: Bearer kani-aBcDeFgH..." \
  -H "Content-Type: application/json" \
  -d '{"model": "kani/auto", "messages": [{"role": "user", "content": "hello"}]}'

client = OpenAI(
    base_url="http://localhost:18420/v1",
    api_key="kani-aBcDeFgH...",  # kani API key
)

/health and /docs are exempt from authentication. No server restart required — keys take effect immediately.

CLI

kani serve [--config path] [--host 0.0.0.0] [--port 18420]
kani route "your prompt here" [--config path]
kani config [--config path]
kani keys add <name>
kani keys list
kani keys remove <name|prefix>

Architecture

src/kani/
├── scorer.py    # distilled feature scoring (15-dimensional classifier)
├── router.py    # Tier → model+provider mapping
├── proxy.py     # FastAPI OpenAI-compatible server
├── config.py    # YAML config loading, env var resolution
├── dirs.py      # XDG-compliant directory paths (config, data, logs)
├── logger.py    # JSONL routing log
└── cli.py       # Click CLI

Development

uv sync
uv run pytest tests/ -q    # 176 tests
uv run ruff check src/
uv run pyright src/

Credits

Scoring logic ported from ClawRouter (MIT license).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.cflx		.cflx
.github/workflows		.github/workflows
assets		assets
data		data
models		models
openspec		openspec
scripts		scripts
src/kani		src/kani
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.DASHBOARD.md		README.DASHBOARD.md
README.md		README.md
config.example.yaml		config.example.yaml
docker-compose.yaml		docker-compose.yaml
grafana-dashboard-kani.json		grafana-dashboard-kani.json
grafana-datasource-kani-sqlite.yml		grafana-datasource-kani-sqlite.yml
kani-dashboard-kani.json		kani-dashboard-kani.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🦀 kani

How it works

Scoring approach

Quick start

Try without installing (uvx)

Local install

Usage — drop-in replacement for OpenAI / OpenRouter

Before (direct OpenAI)

Before (OpenRouter)

After (kani) — auto-routed

curl

Routing profiles

Capability-aware routing

API endpoints

Configuration

Smart-proxy context compaction

Configuration

Session identity

Operator telemetry

Safe config hot reload (admin)

Docker Compose / local deployment

Offline feature annotation

Routing logs

API key authentication

CLI

Architecture

Development

Credits

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages