Skip to content

K1vin1906/mcp-multi-model

Repository files navigation

mcp-multi-model

Give Claude Code superpowers — image gen, video gen, web search, and smart multi-model routing.

One MCP server. All the models you need. Zero tab-switching.

demo

npx mcp-multi-model

If you find this useful, please give it a ⭐ — it helps others discover the project!


What can it do?

🎨 Generate images and videos — right in the terminal

"Generate a macOS app icon with a glowing indigo orb"

Claude calls Imagen 4 / GPT Image / Nano Banana, saves the PNG, and opens it. No browser, no Figma, no context switch.

Video too — Veo 3.1 generates short clips from a text prompt.

🧠 Smart routing — the right model for the job

Need reasoning / agentic coding → it routes to OpenAI GPT-5 / o-series (auto-handles max_completion_tokens, skips temperature where unsupported). Tell Claude to research something → it routes to Gemini (Google Search grounding). Ask it to write code cheaply → it routes to DeepSeek (fast, cheap, great at code). Need real-time info in Chinese → it routes to Kimi (web search).

You don't pick the model. The routing does it for you.

⚖️ Compare models side by side

"Ask both DeepSeek and Gemini how to implement a B-tree"

Two answers, one terminal. See which model gives you a better solution.

🌐 Web search built in

Gemini uses Google Search grounding. Kimi searches the Chinese web. No separate browser-use MCP needed.

🔧 One-line install

{
  "mcpServers": {
    "multi-model": {
      "command": "npx",
      "args": ["-y", "mcp-multi-model"],
      "env": {
        "DEEPSEEK_API_KEY": "sk-...",
        "GEMINI_API_KEY": "AI..."
      }
    }
  }
}

That's it. No git clone, no build step.


Supported Models

12+ providers preconfigured in config.example.yaml. Models without an API key are skipped automatically.

Provider Adapter Why use it
OpenAI openai GPT-5 / GPT-5.5 reasoning, o1 / o3 / o4 series, GPT Image. Reasoning param handling is automatic (max_completion_tokens, temperature skipped where unsupported).
Gemini gemini Long context, Google Search grounding. Image (Imagen 4 Fast / Ultra, Nano Banana 2) and video (Veo 3.1) generation built in.
DeepSeek openai Code, math, logic — extremely low cost
Kimi (Moonshot) openai Chinese web search, real-time info, tool-calling loop
Grok (xAI) openai Real-time X/Twitter context, reasoning
Perplexity openai Sonar models with built-in web search and citations
Anthropic (via OpenRouter) openai Claude models routed through OpenRouter
Mistral / Groq / Qwen / GLM / Together openai EU AI, ultra-fast inference, Chinese-native, open-source aggregators
Ollama / LM Studio / llama.cpp / vLLM openai Local — no API key, no cost, full privacy

Adding a new model is one block in config.yaml — see Configuration.

MCP Tools

Tools are dynamically generated from your config. With the default setup:

Tool What it does
ask_ai Query any model — unified entry with temperature / top_p control
ask_deepseek Query DeepSeek directly
ask_gemini Query Gemini directly
ask_kimi Query Kimi directly
ask_all Query all models in parallel, compare results
ask_both Query any two models in parallel
delegate Smart routing — auto-picks the best model for the task
generate_image Text → image via Gemini Imagen
generate_video Text → video via Gemini Veo
translate CN ↔ EN translation
research Deep research with web search
check_health Ping all models, report status and latency

Installation

Option 1: npx (recommended)

Add to your Claude Code MCP config (~/.mcp.json):

{
  "mcpServers": {
    "multi-model": {
      "command": "npx",
      "args": ["-y", "mcp-multi-model"],
      "env": {
        "DEEPSEEK_API_KEY": "sk-...",
        "GEMINI_API_KEY": "AI..."
      }
    }
  }
}

Option 2: Clone and run locally

git clone https://github.com/K1vin1906/mcp-multi-model.git
cd mcp-multi-model
npm install
npm run setup   # Interactive setup wizard — validates your API keys

Then add to your MCP config:

{
  "mcpServers": {
    "multi-model": {
      "command": "node",
      "args": ["/path/to/mcp-multi-model/index.js"]
    }
  }
}

API keys can be set via env in the config above, or in a .env file in the project directory.

Configuration

cp config.example.yaml config.yaml
defaults:
  max_tokens: 4000
  temperature: 0.7
  timeout_ms: 60000
  max_retries: 2
  # cache_ttl_ms: 300000   # Cache identical prompts for 5 min
  # daily_budget_usd: 5.0  # Daily spending limit in USD

models:
  deepseek:
    name: DeepSeek
    adapter: openai
    endpoint: https://api.deepseek.com/chat/completions
    api_key_env: DEEPSEEK_API_KEY
    model: deepseek-chat
    description: "Code, math, logic. Low cost."
    fallback_to: gemini
    pricing:
      input: 0.14    # $/M tokens
      output: 0.28

  gemini:
    name: Gemini
    adapter: gemini
    endpoint: https://generativelanguage.googleapis.com/v1beta
    api_key_env: GEMINI_API_KEY
    model: gemini-2.5-flash-preview-04-17
    description: "Long context, broad knowledge, Google Search."
    features:
      - google_search
    pricing:
      input: 0.10
      output: 0.40

  # Local models — no API key needed:
  # ollama:
  #   name: Ollama
  #   adapter: openai
  #   endpoint: http://localhost:11434/v1/chat/completions
  #   model: llama3.2

Image Generation

Two endpoint families are routed automatically based on the model ID:

Gemini family (uses GEMINI_API_KEY)

Model ID Endpoint Notes
imagen-4-fast :predict Default, ~$0.02/image
imagen-4-ultra :predict 2K quality, ~$0.06/image
gemini-2.5-flash-image (Nano Banana) :generateContent Fast (~3s), 2,000 RPM free tier
gemini-3-pro-image-preview (Nano Banana 2) :generateContent High quality, 500 RPM

OpenAI family (uses OPENAI_API_KEY)

Model ID Endpoint Notes
gpt-image-2 /v1/images/generations Best text rendering. Requires OpenAI org verification.

Supports aspect_ratio: 1:1, 3:2, 4:3, 16:9, 9:16. quality and size forwarded to OpenAI image endpoints.

Video Generation

Generate short video clips using Gemini Veo 3.1 (uses GEMINI_API_KEY).

Parameter Type Notes
prompt string Text description of the desired video
aspect_ratio 16:9 / 9:16 / 1:1
duration 4 / 6 / 8 (seconds) Must be even — Veo only accepts even durations
save_path string? Defaults to /tmp/mcp-media/videos/

Local Models

Any OpenAI-compatible local runner works — Ollama, LM Studio, llama.cpp, vLLM:

models:
  ollama:
    name: Ollama
    adapter: openai
    endpoint: http://localhost:11434/v1/chat/completions
    model: llama3.2

Mix local and cloud models freely — use ask_all to compare Ollama vs DeepSeek vs Gemini in one call.

Built-in Features

  • Auto-retry & fallback — Exponential backoff on 429/5xx, automatic fallback to backup model
  • Conversation history — Multi-turn context with conversation_id (30min expiry, up to 10 turns)
  • Cost tracking — Per-call token usage and cost estimation
  • Response caching — Cache identical prompts with configurable TTL
  • Daily budget limit — Set a spending cap; calls are blocked when exceeded
  • Streaming — Real-time SSE streaming for all adapters

Privacy

This is a local relay. No telemetry, no analytics, no data sent to the extension author. Prompts go directly from your machine to the LLM provider you configured.

Full policy: k1vin1906.github.io/mcp-multi-model/privacy.html

License

MIT

About

MCP server for Claude Code — parallel multi-model queries, smart routing, image/video gen across OpenAI (GPT-5, GPT Image), Gemini (Imagen 4, Veo), DeepSeek, Kimi & 12+ providers

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors