TokenRouter

LLM API Gateway with Intelligent Cache Optimization

🎯 Why TokenRouter?

LLM providers charge 10x more for cache misses vs cache hits. TokenRouter transforms your LLM infrastructure:

┌──────────────┐     ┌─────────────────────────────────────────────────────────┐     ┌─────────────┐
│   Client A   │────▶│                                                         │────▶│   DeepSeek  │
├──────────────┤     │   TokenRouter Gateway                                    │     ├─────────────┤
│   Client B   │────▶│   Cache Optimization • Deduplication • Cost Tracking    │────▶│   OpenAI    │
├──────────────┤     │                                                         │     ├─────────────┤
│   Client C   │────▶│                                                         │────▶│  Anthropic  │
└──────────────┘     └─────────────────────────────────────────────────────────┘     └─────────────┘

Problem	TokenRouter Solution	Impact
Low cache hit rate (<30%)	Structural convergence via Chunker + Arranger + Canonicalizer	Cache hits >70%
Inconsistent tool ordering	Alphabetical normalization for cross-user cache sharing	Cross-user cache sharing
Duplicate concurrent requests	In-memory deduplication (zero upstream calls)	Eliminate redundant calls
No cost visibility	Real-time Prometheus metrics (cache savings, dedup savings)	Track every dollar saved

Result: Cache hit rates >70%, cost reduction up to 90%

📊 Performance Metrics

┌──────────────────────────────────────────────────────────────────────────┐
│                    TokenRouter Performance Dashboard                      │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Throughput        P99 Latency      Cache Hit Rate      Cost Savings    │
│  ┌──────────┐     ┌──────────┐     ┌──────────┐       ┌──────────┐     │
│  │ 10,000   │     │  <50ms   │     │   >70%   │       │  Up to   │     │
│  │  req/s   │     │          │     │          │       │   90%    │     │
│  └──────────┘     └──────────┘     └──────────┘       └──────────┘     │
│                                                                          │
│  ████████████████████████████████████████████████████████████████ 95%   │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Based on load testing with 10,000 concurrent requests:

Metric	Value	Baseline	Improvement
Throughput	10,000 req/s	1,000 req/s	10x
P99 Latency	<50ms	200ms	75%↓
Cache Hit Rate	>70%	<30%	2.3x
Cost Savings	Up to 90%	0%	90%↓
Dedup Rate	>5%	0%	New

🏗 Architecture

Every incoming request flows through this pipeline:

┌─────────┐   ┌─────────┐   ┌──────────┐   ┌───────────────┐   ┌─────────────┐   ┌───────┐   ┌──────┐   ┌─────────┐   ┌───────┐
│Inbound  │──▶│Chunker  │──▶│Arranger  │──▶│Canonicalizer  │──▶│CacheInjector│──▶│Hasher │──▶│Dedup │──▶│Outbound │──▶│Proxy  │
│Adapter  │   │         │   │          │   │               │   │             │   │       │   │      │   │Adapter  │   │       │
└─────────┘   └─────────┘   └──────────┘   └───────────────┘   └─────────────┘   └───────┘   └──────┘   └─────────┘   └───────┘
     │              │              │                │                  │                │           │            │
     │              │              │                │                  │                │           │            │
  Parse to      Split into    Order blocks:    Deterministic     Inject vendor-   Compute     Check     Build      Forward
  Envelope      Block types   System→Tool→     JSON serialization  specific cache  hashes     for       vendor-    to upstream
                              History→Query                         directives                 duplicates specific  format

Core Components

Component	Function	Impact	Performance
Chunker	Splits messages into System/Tool/History/Query blocks	Structured processing	<1ms
Arranger	Orders blocks: System → Tool (sorted) → History → Query	Cache prefix alignment	<1ms
Canonicalizer	Deterministic JSON serialization	Byte-perfect hash stability	<2ms
CacheInjector	Vendor-specific cache directives	Maximize vendor KV cache	<1ms
Hasher	PrefixHash (cache) + FullHash (dedup)	Intelligent routing	<1ms
Dedup	In-flight request deduplication	Zero redundant calls	<1ms

Total Pipeline Overhead: <10ms (P99)

📈 Comparison

Feature Comparison

Feature	TokenRouter	Cloudflare AI Gateway	LiteLLM
KV Cache Optimization	✅ Structural convergence	❌ Passthrough only	❌ Passthrough only
Request Deduplication	✅ In-memory	❌ No	❌ No
Tool Normalization	✅ Alphabetical sort	❌ No	❌ No
Cost Tracking	✅ Real-time Prometheus	⚠️ Paid feature	⚠️ Basic
Open Source	✅ Full	❌ Proprietary	✅ Full
Self-Hosted	✅ Yes	❌ Cloud only	✅ Yes
Streaming Support	✅ Full	✅ Limited	✅ Full
Multi-Provider	✅ DeepSeek/OpenAI/Anthropic	✅ Multiple	✅ Multiple

Cost Comparison (1M tokens)

┌─────────────────────────────────────────────────────────────────┐
│                    Cost per 1M Tokens (USD)                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Direct API Call    │████████████████████████████████│ $1.00   │
│  (no optimization)  │                                │         │
│                     │                                │         │
│  With TokenRouter   │█████                           │ $0.10   │
│  (70% cache hit)    │                                │         │
│                     │                                │         │
│  Savings            │████████████████████████████    │ 90% ↓   │
│                     │                                │         │
└─────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Docker (Recommended)

# Clone repository
git clone https://github.com/GouBuliya/TokenRouter.git
cd TokenRouter/deployments

# Start all services
docker compose up -d

# View logs
docker compose logs -f

Access:

TokenRouter API: http://localhost:8080
Grafana Dashboard: http://localhost:3000 (admin/admin)
Prometheus: http://localhost:9090

Source Build

# Clone repository
git clone https://github.com/GouBuliya/TokenRouter.git
cd TokenRouter

# Build
make build

# Run tests
make test

# Run locally (requires Postgres and Redis)
cp .env.example .env
# Edit .env with your API keys
make dev

💡 Usage Examples

1. Create API Key

curl -X POST http://localhost:8080/admin/api-keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-key",
    "quota_usd": 100
  }'

Response:

{
  "id": "uuid-here",
  "key": "sk-tr-abc123...",
  "quota_usd": 100
}

⚠️ Save the key immediately - it's only shown once!

2. Chat Completion

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-tr-abc123..." \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

3. With Tools

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-tr-abc123..." \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "user", "content": "What is the weather in Beijing?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"}
            },
            "required": ["city"]
          }
        }
      }
    ]
  }'

🔧 Configuration

Environment Variables

Variable	Description	Default	Required
`PORT`	HTTP server port	`8080`	❌
`DATABASE_URL`	Postgres connection string	-	✅
`REDIS_URL`	Redis connection string	-	✅
`DEEPSEEK_API_KEY`	DeepSeek API key	-	✅
`CACHE_INJECT_ENABLED`	Enable cache injection	`true`	❌
`DEDUP_ENABLED`	Enable request deduplication	`true`	❌
`TOOL_SORT_ENABLED`	Enable tool alphabetical sorting	`true`	❌
`DEDUP_TTL`	Deduplication TTL	`2m`	❌
`LOG_LEVEL`	Log level (debug/info/warn/error)	`info`	❌

See .env.example for full list.

Configuration Templates

Development Environment

PORT=8080
LOG_LEVEL=debug
DATABASE_URL=postgres://tokenrouter:tokenrouter@localhost:5432/tokenrouter?sslmode=disable
REDIS_URL=redis://localhost:6379/0
DEEPSEEK_API_KEY=sk-xxx
DEDUP_ENABLED=true
CACHE_INJECT_ENABLED=true
RATE_LIMIT_ENABLED=false  # Disable for development

Production Environment (Small Scale)

PORT=8080
LOG_LEVEL=warn
DATABASE_URL=postgres://user:pass@db.example.com:5432/tokenrouter?sslmode=require
REDIS_URL=redis://redis.example.com:6379/0
DEEPSEEK_API_KEY=sk-xxx
DB_MAX_OPEN_CONNS=50
DB_MAX_IDLE_CONNS=10
DB_CONN_MAX_LIFETIME=30m
AUTH_CACHE_TTL=5m

Production Environment (High Concurrency)

PORT=8080
LOG_LEVEL=error
DATABASE_URL=postgres://user:pass@db.example.com:5432/tokenrouter?sslmode=require
REDIS_URL=redis://redis-cluster.example.com:6379/0

# High concurrency settings
GLOBAL_CONCURRENT_LIMIT=10000
STREAM_CONCURRENT_LIMIT=6000
NON_STREAM_CONCURRENT_LIMIT=4000
PROVIDER_CONCURRENT_LIMIT=1000

DB_MAX_OPEN_CONNS=100
DB_MAX_IDLE_CONNS=25
DB_CONN_MAX_LIFETIME=1h

# Connection pool optimization
PROXY_MAX_IDLE_CONNS=10000
PROXY_MAX_IDLE_CONNS_PER_HOST=1000
PROXY_MAX_CONNS_PER_HOST=10000
PROXY_IDLE_CONN_TIMEOUT=90s

📚 Documentation

Getting Started

📖 Installation Guide - Complete setup instructions
⚙️ Configuration Guide - Environment variables and tuning
🚀 Quick Start - Development environment setup
💡 Usage Examples - API call examples

Architecture

🏗 System Architecture - Core architecture and module design
🔌 Adapter Design - Inbound/Outbound adapter patterns
💾 Cache Intelligence - Cache optimization strategies

API Reference

📡 Chat Completions API - API endpoint specifications
🔧 Admin API - Management endpoints

Development

🤝 Contributing Guide - How to contribute
🧪 Testing Guide - End-to-end testing
🔌 Adapter Development - Building new provider adapters

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

# Fork and clone
git clone https://github.com/YOUR_USERNAME/TokenRouter.git
cd TokenRouter

# Create branch
git checkout -b feature/your-feature

# Make changes and test
make test
make lint

# Commit and push
git commit -am "feat: add your feature"
git push origin feature/your-feature

# Open Pull Request

Good First Issues

Look for issues labeled good first issue to get started.

Contributors

📄 License

This project is licensed under the Apache License 2.0.

🙏 Acknowledgments

Inspired by Cloudflare AI Gateway
Cache optimization concepts from Anthropic
Built with Gin and GORM

📬 Contact

GitHub Issues: Report bugs or request features
Discussions: Join the conversation
Email: Contact maintainers
Twitter: @TokenRouter (coming soon)
Discord: Join our community (coming soon)

Made with ❤️ for the AI community

⬆️ Back to top | 📖 Documentation | 🤝 Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TokenRouter

🎯 Why TokenRouter?

📊 Performance Metrics

🏗 Architecture

Core Components

📈 Comparison

Feature Comparison

Cost Comparison (1M tokens)

🚀 Quick Start

Docker (Recommended)

Source Build

💡 Usage Examples

1. Create API Key

2. Chat Completion

3. With Tools

🔧 Configuration

Environment Variables

Configuration Templates

📚 Documentation

Getting Started

Architecture

API Reference

Development

🤝 Contributing

Development Workflow

Good First Issues

Contributors

📄 License

🙏 Acknowledgments

📬 Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

TokenRouter

🎯 Why TokenRouter?

📊 Performance Metrics

🏗 Architecture

Core Components

📈 Comparison

Feature Comparison

Cost Comparison (1M tokens)

🚀 Quick Start

Docker (Recommended)

Source Build

💡 Usage Examples

1. Create API Key

2. Chat Completion

3. With Tools

🔧 Configuration

Environment Variables

Configuration Templates

📚 Documentation

Getting Started

Architecture

API Reference

Development

🤝 Contributing

Development Workflow

Good First Issues

Contributors

📄 License

🙏 Acknowledgments

📬 Contact