AI Model Router Lab

Reduce AI API costs by 60-80% through intelligent request routing — sending simple requests to cheap models and reserving premium models for tasks that actually need them.

The Problem

Most teams pay premium model prices for every request, even when:

The environment is dev or test
The request is a simple FAQ or summary
The monthly budget has already been exceeded
A 10x cheaper model would give equally good results

The Solution: Smart Routing

This lab builds a production-ready model router that sits in front of your LLM calls and automatically selects the right model:

Request → Router → gpt-4o-mini  ($0.15/1M tokens)   ← most requests
                → gpt-4o        ($2.50/1M tokens)   ← complex tasks only

Routing Strategies (applied in priority order)

#	Strategy	Logic	Saves
1	Environment	`dev/test/ci` → cheap model always	~100% in non-prod
2	Budget cap	Over monthly limit → cheap model	Prevents overruns
3	Task type	`security_audit`, `architecture`, `compliance` → premium; `chat`, `summarize`, `draft` → cheap	High confidence
4	Complexity	Heuristic score from token count, keywords, code blocks	Broad coverage
5	Default	Fallback → cheap model	Safety net

Architecture

See diagrams/architecture.md for full data flow and decision tree.

Client → FastAPI Router API → RouterEngine → Azure OpenAI (gpt-4o-mini or gpt-4o)
                                    ↓
                              CostTracker (SQLite) → /stats endpoint

Quick Start

1. Provision Azure Resources

az login
chmod +x setup/provision_azure.sh
./setup/provision_azure.sh

This creates:

Resource group rg-ai-router-lab
Azure OpenAI service with two model deployments (gpt-4o-mini + gpt-4o)
Auto-generates your .env file

2. Install Dependencies

pip install -r requirements.txt

3. Run Demos (no API key needed for demos 1 & 2)

# Demo 1: See routing decisions with explanations
python demos/demo_basic.py

# Demo 2: Simulate 50 requests and see cost savings report
python demos/demo_cost_comparison.py

# Demo 3: Real calls to Azure OpenAI (requires .env)
python demos/demo_scenarios.py

4. Start the API Server

uvicorn src.api.main:app --reload

Then test it:

# Route a request (dry run — no model call)
curl "http://localhost:8000/route?prompt=Summarize+this+article&environment=production"

# Full chat request
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is Kubernetes?", "environment": "dev"}'

# Cost savings stats
curl http://localhost:8000/stats

5. Run Tests

pytest tests/ -v

Project Structure

ai-model-router-lab/
├── src/
│   ├── router/
│   │   ├── router.py        # RouterEngine + 5 routing strategies
│   │   ├── classifier.py    # Heuristic complexity & task classifier
│   │   ├── cost_tracker.py  # SQLite cost logging + savings reports
│   │   └── models.py        # Model configs, pricing, enums
│   ├── api/
│   │   ├── main.py          # FastAPI app (/chat, /route, /stats, /health)
│   │   └── schemas.py       # Pydantic request/response models
│   └── utils/
│       └── azure_client.py  # Azure OpenAI thin wrapper
├── demos/
│   ├── demo_basic.py        # Routing decisions (no API)
│   ├── demo_cost_comparison.py  # Cost savings simulation (no API)
│   └── demo_scenarios.py    # Live Azure calls
├── setup/
│   ├── provision_azure.sh   # Create all Azure resources
│   └── teardown_azure.sh    # Clean up everything
├── tests/
│   ├── test_router.py
│   └── test_classifier.py
├── diagrams/
│   └── architecture.md      # Data flow & decision tree diagrams
├── .env.example
├── requirements.txt
└── README.md

API Reference

`POST /chat`

Route a prompt and get a completion.

{
  "prompt": "Your question here",
  "environment": "production",
  "dry_run": false,
  "override_budget": false,
  "max_tokens": 1024,
  "temperature": 0.7
}

Response includes routing metadata showing which model was used, why, and what it cost.

`GET /route?prompt=...&environment=...`

Preview routing decision without calling any model.

`GET /stats`

Cost savings report:

{
  "total_calls": 150,
  "cheap_calls": 112,
  "premium_calls": 38,
  "cheap_pct": 74.7,
  "total_cost_usd": 0.0031,
  "baseline_cost_usd": 0.0187,
  "savings_usd": 0.0156,
  "savings_pct": 83.4,
  "strategy_breakdown": [...]
}

Extending the Router

Add a new strategy by subclassing BaseStrategy in src/router/router.py:

class MyCustomStrategy(BaseStrategy):
    def decide(self, prompt, cheap_model, premium_model, context):
        if my_condition(prompt, context):
            return ModelDecision(
                model=cheap_model,
                strategy=RoutingStrategy.DEFAULT,
                reason="My custom reason",
                confidence=0.8,
            )
        return None  # pass to next strategy

Then add it to the self.strategies list in RouterEngine.__init__.

Cleanup

./setup/teardown_azure.sh

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Model Router Lab

The Problem

The Solution: Smart Routing

Routing Strategies (applied in priority order)

Architecture

Quick Start

1. Provision Azure Resources

2. Install Dependencies

3. Run Demos (no API key needed for demos 1 & 2)

4. Start the API Server

5. Run Tests

Project Structure

API Reference

`POST /chat`

`GET /route?prompt=...&environment=...`

`GET /stats`

Extending the Router

Cleanup

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
demos		demos
diagrams		diagrams
setup		setup
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Model Router Lab

The Problem

The Solution: Smart Routing

Routing Strategies (applied in priority order)

Architecture

Quick Start

1. Provision Azure Resources

2. Install Dependencies

3. Run Demos (no API key needed for demos 1 & 2)

4. Start the API Server

5. Run Tests

Project Structure

API Reference

POST /chat

GET /route?prompt=...&environment=...

GET /stats

Extending the Router

Cleanup

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /chat`

`GET /route?prompt=...&environment=...`

`GET /stats`

Packages