Skip to content

net9876/ai-model-router-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Model Router Lab

Reduce AI API costs by 60-80% through intelligent request routing — sending simple requests to cheap models and reserving premium models for tasks that actually need them.

The Problem

Most teams pay premium model prices for every request, even when:

  • The environment is dev or test
  • The request is a simple FAQ or summary
  • The monthly budget has already been exceeded
  • A 10x cheaper model would give equally good results

The Solution: Smart Routing

This lab builds a production-ready model router that sits in front of your LLM calls and automatically selects the right model:

Request → Router → gpt-4o-mini  ($0.15/1M tokens)   ← most requests
                → gpt-4o        ($2.50/1M tokens)   ← complex tasks only

Routing Strategies (applied in priority order)

# Strategy Logic Saves
1 Environment dev/test/ci → cheap model always ~100% in non-prod
2 Budget cap Over monthly limit → cheap model Prevents overruns
3 Task type security_audit, architecture, compliance → premium; chat, summarize, draft → cheap High confidence
4 Complexity Heuristic score from token count, keywords, code blocks Broad coverage
5 Default Fallback → cheap model Safety net

Architecture

See diagrams/architecture.md for full data flow and decision tree.

Client → FastAPI Router API → RouterEngine → Azure OpenAI (gpt-4o-mini or gpt-4o)
                                    ↓
                              CostTracker (SQLite) → /stats endpoint

Quick Start

1. Provision Azure Resources

az login
chmod +x setup/provision_azure.sh
./setup/provision_azure.sh

This creates:

  • Resource group rg-ai-router-lab
  • Azure OpenAI service with two model deployments (gpt-4o-mini + gpt-4o)
  • Auto-generates your .env file

2. Install Dependencies

pip install -r requirements.txt

3. Run Demos (no API key needed for demos 1 & 2)

# Demo 1: See routing decisions with explanations
python demos/demo_basic.py

# Demo 2: Simulate 50 requests and see cost savings report
python demos/demo_cost_comparison.py

# Demo 3: Real calls to Azure OpenAI (requires .env)
python demos/demo_scenarios.py

4. Start the API Server

uvicorn src.api.main:app --reload

Then test it:

# Route a request (dry run — no model call)
curl "http://localhost:8000/route?prompt=Summarize+this+article&environment=production"

# Full chat request
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is Kubernetes?", "environment": "dev"}'

# Cost savings stats
curl http://localhost:8000/stats

5. Run Tests

pytest tests/ -v

Project Structure

ai-model-router-lab/
├── src/
│   ├── router/
│   │   ├── router.py        # RouterEngine + 5 routing strategies
│   │   ├── classifier.py    # Heuristic complexity & task classifier
│   │   ├── cost_tracker.py  # SQLite cost logging + savings reports
│   │   └── models.py        # Model configs, pricing, enums
│   ├── api/
│   │   ├── main.py          # FastAPI app (/chat, /route, /stats, /health)
│   │   └── schemas.py       # Pydantic request/response models
│   └── utils/
│       └── azure_client.py  # Azure OpenAI thin wrapper
├── demos/
│   ├── demo_basic.py        # Routing decisions (no API)
│   ├── demo_cost_comparison.py  # Cost savings simulation (no API)
│   └── demo_scenarios.py    # Live Azure calls
├── setup/
│   ├── provision_azure.sh   # Create all Azure resources
│   └── teardown_azure.sh    # Clean up everything
├── tests/
│   ├── test_router.py
│   └── test_classifier.py
├── diagrams/
│   └── architecture.md      # Data flow & decision tree diagrams
├── .env.example
├── requirements.txt
└── README.md

API Reference

POST /chat

Route a prompt and get a completion.

{
  "prompt": "Your question here",
  "environment": "production",
  "dry_run": false,
  "override_budget": false,
  "max_tokens": 1024,
  "temperature": 0.7
}

Response includes routing metadata showing which model was used, why, and what it cost.

GET /route?prompt=...&environment=...

Preview routing decision without calling any model.

GET /stats

Cost savings report:

{
  "total_calls": 150,
  "cheap_calls": 112,
  "premium_calls": 38,
  "cheap_pct": 74.7,
  "total_cost_usd": 0.0031,
  "baseline_cost_usd": 0.0187,
  "savings_usd": 0.0156,
  "savings_pct": 83.4,
  "strategy_breakdown": [...]
}

Extending the Router

Add a new strategy by subclassing BaseStrategy in src/router/router.py:

class MyCustomStrategy(BaseStrategy):
    def decide(self, prompt, cheap_model, premium_model, context):
        if my_condition(prompt, context):
            return ModelDecision(
                model=cheap_model,
                strategy=RoutingStrategy.DEFAULT,
                reason="My custom reason",
                confidence=0.8,
            )
        return None  # pass to next strategy

Then add it to the self.strategies list in RouterEngine.__init__.

Cleanup

./setup/teardown_azure.sh

License

MIT

About

Intelligent LLM request router for Azure OpenAI — automatically routes prompts to GPT-4o-mini or GPT-4o based on complexity, task type, environment, and budget. Cuts AI API costs by 60-80% with zero impact on application quality.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors