Reduce AI API costs by 60-80% through intelligent request routing — sending simple requests to cheap models and reserving premium models for tasks that actually need them.
Most teams pay premium model prices for every request, even when:
- The environment is
devortest - The request is a simple FAQ or summary
- The monthly budget has already been exceeded
- A 10x cheaper model would give equally good results
This lab builds a production-ready model router that sits in front of your LLM calls and automatically selects the right model:
Request → Router → gpt-4o-mini ($0.15/1M tokens) ← most requests
→ gpt-4o ($2.50/1M tokens) ← complex tasks only
| # | Strategy | Logic | Saves |
|---|---|---|---|
| 1 | Environment | dev/test/ci → cheap model always |
~100% in non-prod |
| 2 | Budget cap | Over monthly limit → cheap model | Prevents overruns |
| 3 | Task type | security_audit, architecture, compliance → premium; chat, summarize, draft → cheap |
High confidence |
| 4 | Complexity | Heuristic score from token count, keywords, code blocks | Broad coverage |
| 5 | Default | Fallback → cheap model | Safety net |
See diagrams/architecture.md for full data flow and decision tree.
Client → FastAPI Router API → RouterEngine → Azure OpenAI (gpt-4o-mini or gpt-4o)
↓
CostTracker (SQLite) → /stats endpoint
az login
chmod +x setup/provision_azure.sh
./setup/provision_azure.shThis creates:
- Resource group
rg-ai-router-lab - Azure OpenAI service with two model deployments (
gpt-4o-mini+gpt-4o) - Auto-generates your
.envfile
pip install -r requirements.txt# Demo 1: See routing decisions with explanations
python demos/demo_basic.py
# Demo 2: Simulate 50 requests and see cost savings report
python demos/demo_cost_comparison.py
# Demo 3: Real calls to Azure OpenAI (requires .env)
python demos/demo_scenarios.pyuvicorn src.api.main:app --reloadThen test it:
# Route a request (dry run — no model call)
curl "http://localhost:8000/route?prompt=Summarize+this+article&environment=production"
# Full chat request
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "What is Kubernetes?", "environment": "dev"}'
# Cost savings stats
curl http://localhost:8000/statspytest tests/ -vai-model-router-lab/
├── src/
│ ├── router/
│ │ ├── router.py # RouterEngine + 5 routing strategies
│ │ ├── classifier.py # Heuristic complexity & task classifier
│ │ ├── cost_tracker.py # SQLite cost logging + savings reports
│ │ └── models.py # Model configs, pricing, enums
│ ├── api/
│ │ ├── main.py # FastAPI app (/chat, /route, /stats, /health)
│ │ └── schemas.py # Pydantic request/response models
│ └── utils/
│ └── azure_client.py # Azure OpenAI thin wrapper
├── demos/
│ ├── demo_basic.py # Routing decisions (no API)
│ ├── demo_cost_comparison.py # Cost savings simulation (no API)
│ └── demo_scenarios.py # Live Azure calls
├── setup/
│ ├── provision_azure.sh # Create all Azure resources
│ └── teardown_azure.sh # Clean up everything
├── tests/
│ ├── test_router.py
│ └── test_classifier.py
├── diagrams/
│ └── architecture.md # Data flow & decision tree diagrams
├── .env.example
├── requirements.txt
└── README.md
Route a prompt and get a completion.
{
"prompt": "Your question here",
"environment": "production",
"dry_run": false,
"override_budget": false,
"max_tokens": 1024,
"temperature": 0.7
}Response includes routing metadata showing which model was used, why, and what it cost.
Preview routing decision without calling any model.
Cost savings report:
{
"total_calls": 150,
"cheap_calls": 112,
"premium_calls": 38,
"cheap_pct": 74.7,
"total_cost_usd": 0.0031,
"baseline_cost_usd": 0.0187,
"savings_usd": 0.0156,
"savings_pct": 83.4,
"strategy_breakdown": [...]
}Add a new strategy by subclassing BaseStrategy in src/router/router.py:
class MyCustomStrategy(BaseStrategy):
def decide(self, prompt, cheap_model, premium_model, context):
if my_condition(prompt, context):
return ModelDecision(
model=cheap_model,
strategy=RoutingStrategy.DEFAULT,
reason="My custom reason",
confidence=0.8,
)
return None # pass to next strategyThen add it to the self.strategies list in RouterEngine.__init__.
./setup/teardown_azure.shMIT