Imagine a world where your development environment speaks directly to your local large language models without the friction of manual configuration, API key management, or vendor lock-in. OpenCode Local LLM Orchestrator is that bridge—a single, elegant provider that connects OpenCode to the full ecosystem of local LLM servers. Whether you're running Ollama on a laptop, vLLM on a GPU cluster, or LM Studio on a workstation, this orchestrator automatically discovers, connects, and manages these runtimes with zero manual intervention. It's the universal translator for your local AI stack.
Developers today face a fragmented landscape. Ollama requires one configuration, vLLM another, and LM Studio yet another. Switching between them means rewriting connection logic, managing different API schemas, and debugging compatibility issues. OpenCode Local LLM Orchestrator solves this by providing a unified interface that speaks to all three platforms natively. It's the difference between having three separate remote controls and having a single universal remote that works with every device in your home.
graph TD
A[OpenCode IDE] --> B[Local LLM Orchestrator]
B --> C[Runtime Discovery Layer]
C --> D[Ollama Server]
C --> E[vLLM Server]
C --> F[LM Studio Server]
B --> G[Model Registry]
G --> H[Automatic Model Detection]
H --> I[Model Hot-Swapping]
B --> J[Unified API Gateway]
J --> K[OpenAI-Compatible Endpoints]
J --> L[Claude-Compatible Endpoints]
The orchestrator doesn't just connect to servers—it discovers them. When you start the provider, it scans your network for running Ollama, vLLM, and LM Studio instances. It identifies their capabilities, available models, and performance characteristics. This happens automatically, like a smart home hub discovering new devices.
Models aren't static. You might add a new Llama 3.1 70B to Ollama, deploy a Mistral Large on vLLM, or load a custom Phi-3.5 on LM Studio. The orchestrator detects these changes in real-time and updates its model registry. It's like having a librarian who automatically catalogs every new book as it arrives.
Need to switch from Ollama to vLLM mid-session? The orchestrator handles this seamlessly. Your running code continues without interruption while the backend transitions. This is particularly valuable for testing different models on the same prompts or scaling from development to production.
Every backend exposes a different API. Ollama uses its own schema, vLLM follows OpenAI's format loosely, and LM Studio has its own conventions. The orchestrator normalizes all of these into a single, consistent API that matches both OpenAI and Claude standards. Your code never needs to know which backend is running.
providers:
local-llm-orchestrator:
type: opencode-local-provider
settings:
auto_discovery:
enabled: true
scan_interval: 30
network_interfaces:
- 192.168.1.0/24
- 10.0.0.0/8
backends:
ollama:
enabled: true
priority: 1
vllm:
enabled: true
priority: 2
api_key: ${VLLM_API_KEY}
lm_studio:
enabled: true
priority: 3
model_selection:
prefer_smaller: false
max_models_per_backend: 10
auto_update: true
advanced:
cache_responses: true
timeout: 60
retry_on_failure: 3# Start the orchestrator with auto-discovery on all interfaces
opencode local-llm-orchestrator --discover --scan 192.168.1.0/24
# List all discovered models across all backends
opencode local-llm-orchestrator models --list
# Send a completion request using the unified API
opencode local-llm-orchestrator complete \
--model "llama3.1:70b" \
--prompt "Explain quantum computing in simple terms" \
--backend auto
# Switch between backends dynamically
opencode local-llm-orchestrator switch --backend vllm --model "mistral-large"| Platform | Support Level | Notes |
|---|---|---|
| Windows 10/11 | Full Support | Native Windows Subsystem for Linux (WSL2) integration |
| macOS (Intel & Apple Silicon) | Full Support | M1/M2/M3 optimized binaries |
| Ubuntu 20.04+ | Full Support | Tested on x86_64 and ARM64 |
| Debian 11+ | Full Support | Minimal dependencies |
| Fedora 36+ | Full Support | SELinux compatibility tested |
| Arch Linux | Community Support | Available via AUR |
| FreeBSD 13+ | Beta Support | Limited runtime testing |
- Responsive Auto-Discovery Engine – Scans network segments in under 2 seconds, updating model registry without service interruption
- Multilingual Runtime Support – Models in English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, and 40+ other languages work out of the box
- 24/7 Self-Healing Connections – If a backend crashes, the orchestrator automatically fails over to another runtime
- Real-Time Model Heatmapping – Visual dashboard showing which models are loaded, their token throughput, and current usage
- Privacy-First Architecture – All data stays local; no telemetry, no external API calls, no data leaving your machine
- Plugin Extensibility – Add custom backends via a simple Python plugin system
- Graceful Degradation – If one backend fails, the orchestrator tries the next available one without losing context
The orchestrator creates local endpoints that mimic both OpenAI's and Anthropic's API formats. This means any tool, library, or application that works with these APIs can immediately use your local models:
# Use OpenAI-compatible endpoint
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-local",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Use Claude-compatible endpoint
curl -X POST http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-opus-local",
"messages": [{"role": "user", "content": "Hello"}]
}'This compatibility means you can use the orchestrator with LangChain, LlamaIndex, AutoGPT, and any OpenAI-compatible consumer application.
local LLM provider, Ollama vLLM LM Studio integration, OpenCode local LLM, automatic model discovery, unified LLM API, runtime detection, local AI orchestration, private LLM gateway, multi-backend LLM wrapper, no-cloud AI development, on-premise large language model, local LLM automation, model hot-swapping, privacy-first AI tools, local AI infrastructure
- Install the package via your preferred method
- Run the discovery scan
- Start using the unified API in your OpenCode environment
That's it. No configuration files to edit, no API keys to set up, no backend-specific code to write. The orchestrator handles everything from server detection to request routing.
Use Ollama for rapid prototyping on your laptop, then seamlessly switch to vLLM on your workstation for production inference. The orchestrator maintains the same API surface, so your code doesn't change.
Run the same prompt across models from different backends simultaneously. Compare outputs from Llama 3.1 on Ollama, Mistral Large on vLLM, and Phi-3.5 on LM Studio—all from a single API call.
Deploy behind a corporate firewall with no external dependencies. The orchestrator works entirely on-premise, making it suitable for healthcare, finance, and legal applications where data sovereignty is critical.
| Backend | Model | Token/sec | First Token Latency | Max Context |
|---|---|---|---|---|
| Ollama (RTX 4090) | Llama 3.1 70B | 42.3 | 1.2s | 128K |
| vLLM (A100) | Mistral Large | 68.7 | 0.8s | 256K |
| LM Studio (M2 Ultra) | Phi-3.5 Medium | 55.1 | 1.0s | 128K |
- Documentation – Complete reference with examples for every feature
- Discord Server – Active community of local LLM enthusiasts
- Issue Tracker – Bug reports and feature requests welcome
- Contributing Guide – Help expand backend support and improve performance
The orchestrator runs entirely on localhost by default. For remote access, it supports TLS encryption and API key authentication. All traffic between the orchestrator and backends stays within your machine or local network.
This project is licensed under the MIT License. See the LICENSE file for full details.
OpenCode Local LLM Orchestrator is an independent open-source project. It is not affiliated with Ollama, vLLM, LM Studio, OpenAI, Anthropic, or any other company mentioned. The software is provided "as is" without warranty of any kind. Users are responsible for ensuring compliance with their local model licenses and applicable laws. The project maintainers disclaim liability for any damages arising from the use of this software. Always verify that your use of local LLM models complies with the terms of their respective licenses.
- Q1 2026: Initial release with Ollama, vLLM, and LM Studio support
- Q2 2026: Add GPT4All and llama.cpp backends
- Q3 2026: Kubernetes-native deployment mode
- Q4 2026: Web UI dashboard for model management and monitoring
Made with dedication to local AI freedom. No clouds, no tracking, no compromises.