βββ βββββββββ βββββββββββ
βββ βββββββββ βββββββββββ
ββββββββββββββ βββββββββ
βββββββββββββββ ββββββββββ
βββ ββββββ βββββββ ββββββββ
βββ ββββββ βββββ ββββββββ
Like Claude Code, but runs entirely on your hardware.
π 100% Local Β· π Free Forever Β· β‘ Real-Time Streaming Β· π Multi-Node Ready
Hive is a fully local, privacy-first AI coding assistant that runs in your terminal. It connects to your local LLM (via Ollama or llama.cpp) and gives you an agentic coding experience β reading files, writing code, running commands, and searching your codebase β all without sending a single byte to the cloud.
It's also designed from the ground up for distributed inference: pool the VRAM and RAM of multiple laptops on your LAN to run models that no single machine could handle alone.
Hive doesn't just chat β it acts. The LLM autonomously calls tools to complete multi-step tasks:
| Tool | Description |
|---|---|
read_file |
Read any file in your project |
write_file |
Create new files or overwrite existing ones |
edit_file |
Targeted find-and-replace edits |
run_command |
Execute shell commands (build, test, install) |
list_directory |
Explore project structure with sizes |
search_files |
Regex search across your codebase |
The agent runs up to 20 autonomous tool-calling rounds per request, self-correcting on errors and chaining actions to complete complex tasks.
Responses stream token-by-token directly to your terminal with live Markdown rendering via Rich. No waiting for the full response β you see text appear in real-time.
- ASCII art boot screen with model info, GPU/VRAM stats, git branch
- Live Markdown rendering with syntax-highlighted code blocks
- Purple neon theme with rich formatting throughout
- Alternate screen buffer β clean start, terminal restored on exit
- Bottom toolbar showing CWD and git status
- Tool execution visualization with call/result indicators
Conversations are saved to a local SQLite database (~/.hive/sessions.db). Resume where you left off:
- Automatic session creation and history tracking
- Resume previous sessions on startup
- Browse and switch between past sessions with
/sessionsand/resume
- Model registry with 10 popular models pre-configured (Qwen, Llama, DeepSeek, Phi, Gemma, Mistral)
- Direct downloads from Hugging Face with resume support
- Ollama integration β automatically detects and uses installed Ollama models
- Hot-swap models mid-conversation with
/model
A React-based dashboard (Vite + React Router) for monitoring and interaction:
- Dashboard β cluster health, node hardware, VRAM/RAM bars, running models
- Chat β browser-based chat interface
- Models β view all available, downloadable, and locked models
- Settings β cluster configuration
Pool GPU resources across multiple machines on your LAN:
- mDNS auto-discovery β nodes find each other automatically via Zeroconf
- HMAC-SHA256 authentication β shared secret for cluster trust
- Automatic layer sharding β model layers distributed proportional to VRAM
- llama.cpp RPC protocol β workers expose GPUs as remote compute devices
- Auto-download of binaries β llama.cpp
rpc-serverandllama-serverare fetched from GitHub releases
- Per-response stats: tool count, tokens, tok/s, latency
- Prometheus metrics endpoint (
/metrics): tok/s, VRAM/RAM usage, GPU temperature, inference latency histograms - Periodic hardware refresh with real-time GPU monitoring
| Command | Description |
|---|---|
/help |
Show all commands |
/clear |
Clear conversation history |
/compact |
Summarize history to save context window |
/model <name> |
Hot-swap to a different model |
/cwd [path] |
Print or change working directory |
/tokens |
Session statistics (messages, tool calls, tokens) |
/undo |
Revert the last file edit the agent made |
/git <cmd> |
Run git commands (status, diff, branch) |
/nodes |
Show live cluster node status |
/sessions |
List past sessions |
/resume <id> |
Resume a specific session |
/paste |
Toggle multiline input mode |
/exit |
Quit Hive |
- Python 3.10+
- Ollama installed and running (for single-node mode)
- A model pulled:
ollama pull qwen3.5(or any model you prefer)
git clone https://github.com/Dhruv546Narang/Hive.git
cd Hive
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
pip install -e .# Start the interactive coding assistant (default)
hive chat
# Use a specific model
hive chat -m llama3.1
# Start the coordinator daemon (API + Web Dashboard)
hive start
# Check cluster status
hive status
# List all available models (registry + Ollama)
hive models
# Download a model from the registry
hive pull qwen2.5-7b
# Start a worker node (on another machine)
hive workerTip: Running
hivewith no arguments launches the chat assistant directly.
hive/
βββ cli/ # Terminal UI & Chat
β βββ entry.py # CLI entrypoint & argument parser
β βββ chat.py # Main chat loop with streaming
β βββ commands.py # Slash command handlers
β βββ display.py # Rich console, boot screen, tables
β βββ utils.py # GPU/RAM/Git detection, model loading
β
βββ coordinator/ # Coordinator Daemon
β βββ main.py # FastAPI app with lifespan management
β βββ router.py # API routes (OpenAI-compat, cluster, metrics)
β βββ agent.py # Agentic loop (prompt β tools β response)
β βββ tools.py # Tool definitions & executors
β βββ rpc_client.py # Ollama / llama-server API client
β βββ inference.py # Distributed llama-server manager
β βββ discovery.py # mDNS broadcast & listener (AsyncZeroconf)
β βββ capacity.py # Hardware detection (GPU, RAM, classification)
β βββ shard_planner.py # Layer allocation for multi-node
β βββ model_downloader.py # HuggingFace GGUF downloader with resume
β βββ model_watcher.py # Filesystem watcher for ~/hive/models/
β βββ binary_manager.py # Auto-download llama.cpp binaries
β βββ sessions.py # SQLite session persistence
β βββ metrics.py # Prometheus gauges, counters, histograms
β βββ config.py # TOML configuration (Pydantic)
β βββ auth.py # HMAC-SHA256 cluster authentication
β
βββ worker/ # Worker Daemon
β βββ main.py # Worker entrypoint (mDNS + rpc-server)
β βββ rpc_server.py # llama.cpp rpc-server process manager
β
βββ ui/ # Web Dashboard (React + Vite)
β βββ src/
β β βββ pages/ # Dashboard, Chat, Models, Settings
β β βββ components/ # Sidebar
β β βββ api.js # API client
β β βββ App.jsx # Router setup
β β βββ index.css # Global styles
β βββ package.json
β
βββ models/
β βββ registry.json # Pre-configured model registry (10 models)
β
βββ config/
β βββ default.toml # Default configuration
β
βββ pyproject.toml # Python package configuration
User Input
β
βΌ
βββββββββββββββββββ
β Build Messages ββββ System prompt + conversation history + tool definitions
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β LLM Inference ββββ Ollama or llama-server (OpenAI-compatible API)
β (streaming) β
ββββββββββ¬βββββββββ
β
βΌ
Tool calls?
β± β²
Yes No
β β
βΌ βΌ
Execute Stream text
tools to terminal
β
βΌ
Append results
to history
β
βΌ
Loop back βββββββΊ (up to 20 rounds)
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Machine A β β Machine B β β Machine C β
β RTX 4050 6GB βββββΊβ GTX 1660 6GB βββββΊβ RTX 3060 8GB β
β Layers 0β12 β β Layers 13β22 β β Layers 23β32 β
β β β β β β
β coordinator β β worker β β worker β
β + llama-server β β + rpc-server β β + rpc-server β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β β
βββββββββββββββββββ LAN (mDNS) ββββββββββββββββββ
- Workers start
rpc-serverto expose their GPU(s) as remote compute - Workers broadcast their presence via mDNS
- Coordinator discovers workers, builds
--rpcflag with all endpoints - Coordinator starts
llama-serverwhich auto-distributes layers proportional to VRAM - CLI/API talks to the coordinator's OpenAI-compatible endpoint
- Tokens stream back to the client in real-time
Hive loads configuration from ~/.hive/config.toml (falls back to config/default.toml):
cluster_secret = "your-secret-here" # Shared secret for cluster auth
model_dir = "~/hive/models" # GGUF model storage directory
coordinator_port = 8000 # Coordinator API port
worker_port = 50052 # llama.cpp rpc-server port
inference_port = 8081 # Distributed llama-server API port
offload_factor = 0.6 # RAM usage factor (60% conservative)Environment variables override config:
HIVE_COORDINATOR_PORT=9000 hive startHive ships with a curated registry of popular GGUF models. Use hive models to view their status:
| Status | Model | Params | VRAM (Q4) | Notes |
|---|---|---|---|---|
| β/β/π | Qwen 2.5 7B Instruct | 7B | ~5 GB | Great for single GPU |
| β/β/π | Qwen 2.5 14B Instruct | 14B | ~9 GB | Fits 2Γ 6GB GPUs |
| β/β/π | Qwen 2.5 32B Instruct | 32B | ~20 GB | Needs 3+ nodes |
| β/β/π | Llama 3.1 8B Instruct | 8B | ~5 GB | Single GPU |
| β/β/π | Llama 3.1 70B Instruct | 70B | ~42 GB | Needs large cluster |
| β/β/π | DeepSeek R1 7B | 7B | ~5 GB | Reasoning model |
| β/β/π | DeepSeek R1 32B | 32B | ~20 GB | Reasoning model |
| β/β/π | Phi 4 14B | 14B | ~9 GB | Microsoft |
| β/β/π | Gemma 3 12B Instruct | 12B | ~8 GB | |
| β/β/π | Mistral 7B Instruct v0.3 | 7B | ~5 GB | Fast & capable |
Status legend: β Available (downloaded) Β· β Downloadable (fits cluster) Β· π Locked (needs more VRAM)
Download models with:
hive pull qwen2.5-7b # From registry
hive pull Qwen/Qwen2.5-7B-Instruct-GGUF # Direct HuggingFace repoHive also auto-detects all models installed via Ollama β no additional setup needed.
| Layer | Technology | Purpose |
|---|---|---|
| CLI & Agent | Python, asyncio, Rich, prompt_toolkit | Terminal UI & agentic loop |
| LLM Backend | Ollama / llama.cpp | Local inference (single & distributed) |
| API Server | FastAPI, Uvicorn | OpenAI-compatible API + cluster management |
| Node Discovery | AsyncZeroconf (mDNS) | Zero-config LAN peer discovery |
| Authentication | HMAC-SHA256 | Shared-secret cluster trust |
| Web Dashboard | React 19, Vite 6, React Router 7 | Browser-based monitoring & chat |
| Session Store | SQLite | Conversation persistence |
| Model Watcher | Watchdog | Live GGUF file detection |
| Metrics | prometheus-client | Observability (tok/s, VRAM, latency) |
| Configuration | TOML + Pydantic | Type-safe settings |
- Interactive CLI with premium TUI
- Agentic tool calling (read, write, edit, run, search)
- Real-time token streaming with Markdown rendering
- Session persistence & resume (SQLite)
- Slash commands (
/help,/undo,/compact,/git, etc.) - Model registry with HuggingFace downloads
- Ollama auto-detection
- Web dashboard (React + Vite)
- FastAPI coordinator with OpenAI-compatible API
- mDNS node discovery
- HMAC cluster authentication
- Shard planner (layer allocation)
- Auto-download llama.cpp binaries
- Worker daemon with rpc-server management
- Prometheus metrics endpoint
- Context compaction (
/compact) - Multi-node layer sharding (end-to-end testing)
- Tab completion for commands and file paths
- Image understanding (multimodal models)
- Auto-commit with AI-generated messages
- Plugin system for custom tools
- Conversation export (Markdown/JSON)
Contributions are welcome! This project is in active development.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License β see LICENSE for details.
Built with π by Dhruv Narang
Your code stays on your machine. Always.