GitHub - Dhruv546Narang/Hive

 ██╗  ██╗██╗██╗   ██╗███████╗
 ██║  ██║██║██║   ██║██╔════╝
 ███████║██║██║   ██║█████╗  
 ██╔══██║██║╚██╗ ██╔╝██╔══╝  
 ██║  ██║██║ ╚████╔╝ ███████╗
 ╚═╝  ╚═╝╚═╝  ╚═══╝  ╚══════╝

Local AI Coding Assistant & Distributed Inference Engine

Like Claude Code, but runs entirely on your hardware.

🔒 100% Local · 🆓 Free Forever · ⚡ Real-Time Streaming · 🌐 Multi-Node Ready

🐝 What is Hive?

Hive is a fully local, privacy-first AI coding assistant that runs in your terminal. It connects to your local LLM (via Ollama or llama.cpp) and gives you an agentic coding experience — reading files, writing code, running commands, and searching your codebase — all without sending a single byte to the cloud.

It's also designed from the ground up for distributed inference: pool the VRAM and RAM of multiple laptops on your LAN to run models that no single machine could handle alone.

✨ Features

🤖 Agentic Tool Calling

Hive doesn't just chat — it acts. The LLM autonomously calls tools to complete multi-step tasks:

Tool	Description
`read_file`	Read any file in your project
`write_file`	Create new files or overwrite existing ones
`edit_file`	Targeted find-and-replace edits
`run_command`	Execute shell commands (build, test, install)
`list_directory`	Explore project structure with sizes
`search_files`	Regex search across your codebase

The agent runs up to 20 autonomous tool-calling rounds per request, self-correcting on errors and chaining actions to complete complex tasks.

⚡ Real-Time Streaming

Responses stream token-by-token directly to your terminal with live Markdown rendering via Rich. No waiting for the full response — you see text appear in real-time.

🎨 Premium Terminal UI

ASCII art boot screen with model info, GPU/VRAM stats, git branch
Live Markdown rendering with syntax-highlighted code blocks
Purple neon theme with rich formatting throughout
Alternate screen buffer — clean start, terminal restored on exit
Bottom toolbar showing CWD and git status
Tool execution visualization with call/result indicators

💾 Session Persistence

Conversations are saved to a local SQLite database (~/.hive/sessions.db). Resume where you left off:

Automatic session creation and history tracking
Resume previous sessions on startup
Browse and switch between past sessions with /sessions and /resume

📦 Built-in Model Management

Model registry with 10 popular models pre-configured (Qwen, Llama, DeepSeek, Phi, Gemma, Mistral)
Direct downloads from Hugging Face with resume support
Ollama integration — automatically detects and uses installed Ollama models
Hot-swap models mid-conversation with /model

🌐 Web Dashboard

A React-based dashboard (Vite + React Router) for monitoring and interaction:

Dashboard — cluster health, node hardware, VRAM/RAM bars, running models
Chat — browser-based chat interface
Models — view all available, downloadable, and locked models
Settings — cluster configuration

🔗 Distributed Inference (Multi-Node)

Pool GPU resources across multiple machines on your LAN:

mDNS auto-discovery — nodes find each other automatically via Zeroconf
HMAC-SHA256 authentication — shared secret for cluster trust
Automatic layer sharding — model layers distributed proportional to VRAM
llama.cpp RPC protocol — workers expose GPUs as remote compute devices
Auto-download of binaries — llama.cpp rpc-server and llama-server are fetched from GitHub releases

📊 Observability

Per-response stats: tool count, tokens, tok/s, latency
Prometheus metrics endpoint (/metrics): tok/s, VRAM/RAM usage, GPU temperature, inference latency histograms
Periodic hardware refresh with real-time GPU monitoring

🛠️ Built-in Commands

Command	Description
`/help`	Show all commands
`/clear`	Clear conversation history
`/compact`	Summarize history to save context window
`/model <name>`	Hot-swap to a different model
`/cwd [path]`	Print or change working directory
`/tokens`	Session statistics (messages, tool calls, tokens)
`/undo`	Revert the last file edit the agent made
`/git <cmd>`	Run git commands (`status`, `diff`, `branch`)
`/nodes`	Show live cluster node status
`/sessions`	List past sessions
`/resume <id>`	Resume a specific session
`/paste`	Toggle multiline input mode
`/exit`	Quit Hive

🚀 Quick Start

Prerequisites

Python 3.10+
Ollama installed and running (for single-node mode)
A model pulled: ollama pull qwen3.5 (or any model you prefer)

Install

git clone https://github.com/Dhruv546Narang/Hive.git
cd Hive
python -m venv .venv

# Windows
.\.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

pip install -e .

Run

# Start the interactive coding assistant (default)
hive chat

# Use a specific model
hive chat -m llama3.1

# Start the coordinator daemon (API + Web Dashboard)
hive start

# Check cluster status
hive status

# List all available models (registry + Ollama)
hive models

# Download a model from the registry
hive pull qwen2.5-7b

# Start a worker node (on another machine)
hive worker

Tip: Running hive with no arguments launches the chat assistant directly.

🏗️ Architecture

hive/
├── cli/                          # Terminal UI & Chat
│   ├── entry.py                  # CLI entrypoint & argument parser
│   ├── chat.py                   # Main chat loop with streaming
│   ├── commands.py               # Slash command handlers
│   ├── display.py                # Rich console, boot screen, tables
│   └── utils.py                  # GPU/RAM/Git detection, model loading
│
├── coordinator/                  # Coordinator Daemon
│   ├── main.py                   # FastAPI app with lifespan management
│   ├── router.py                 # API routes (OpenAI-compat, cluster, metrics)
│   ├── agent.py                  # Agentic loop (prompt → tools → response)
│   ├── tools.py                  # Tool definitions & executors
│   ├── rpc_client.py             # Ollama / llama-server API client
│   ├── inference.py              # Distributed llama-server manager
│   ├── discovery.py              # mDNS broadcast & listener (AsyncZeroconf)
│   ├── capacity.py               # Hardware detection (GPU, RAM, classification)
│   ├── shard_planner.py          # Layer allocation for multi-node
│   ├── model_downloader.py       # HuggingFace GGUF downloader with resume
│   ├── model_watcher.py          # Filesystem watcher for ~/hive/models/
│   ├── binary_manager.py         # Auto-download llama.cpp binaries
│   ├── sessions.py               # SQLite session persistence
│   ├── metrics.py                # Prometheus gauges, counters, histograms
│   ├── config.py                 # TOML configuration (Pydantic)
│   └── auth.py                   # HMAC-SHA256 cluster authentication
│
├── worker/                       # Worker Daemon
│   ├── main.py                   # Worker entrypoint (mDNS + rpc-server)
│   └── rpc_server.py             # llama.cpp rpc-server process manager
│
├── ui/                           # Web Dashboard (React + Vite)
│   ├── src/
│   │   ├── pages/                # Dashboard, Chat, Models, Settings
│   │   ├── components/           # Sidebar
│   │   ├── api.js                # API client
│   │   ├── App.jsx               # Router setup
│   │   └── index.css             # Global styles
│   └── package.json
│
├── models/
│   └── registry.json             # Pre-configured model registry (10 models)
│
├── config/
│   └── default.toml              # Default configuration
│
└── pyproject.toml                # Python package configuration

How the Agent Loop Works

User Input
    │
    ▼
┌─────────────────┐
│  Build Messages  │◄── System prompt + conversation history + tool definitions
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  LLM Inference   │◄── Ollama or llama-server (OpenAI-compatible API)
│  (streaming)     │
└────────┬────────┘
         │
         ▼
    Tool calls?
     ╱        ╲
   Yes          No
    │            │
    ▼            ▼
Execute        Stream text
tools          to terminal
    │
    ▼
Append results
to history
    │
    ▼
Loop back ──────► (up to 20 rounds)

How Distributed Inference Works

┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   Machine A       │    │   Machine B       │    │   Machine C       │
│   RTX 4050 6GB    │◄──►│   GTX 1660 6GB    │◄──►│   RTX 3060 8GB    │
│   Layers 0–12     │    │   Layers 13–22    │    │   Layers 23–32    │
│                   │    │                   │    │                   │
│   coordinator     │    │   worker          │    │   worker          │
│   + llama-server  │    │   + rpc-server    │    │   + rpc-server    │
└──────────────────┘    └──────────────────┘    └──────────────────┘
        │                                                │
        └────────────────── LAN (mDNS) ─────────────────┘

Workers start rpc-server to expose their GPU(s) as remote compute
Workers broadcast their presence via mDNS
Coordinator discovers workers, builds --rpc flag with all endpoints
Coordinator starts llama-server which auto-distributes layers proportional to VRAM
CLI/API talks to the coordinator's OpenAI-compatible endpoint
Tokens stream back to the client in real-time

⚙️ Configuration

Hive loads configuration from ~/.hive/config.toml (falls back to config/default.toml):

cluster_secret = "your-secret-here"   # Shared secret for cluster auth
model_dir = "~/hive/models"           # GGUF model storage directory
coordinator_port = 8000               # Coordinator API port
worker_port = 50052                   # llama.cpp rpc-server port
inference_port = 8081                 # Distributed llama-server API port
offload_factor = 0.6                  # RAM usage factor (60% conservative)

Environment variables override config:

HIVE_COORDINATOR_PORT=9000 hive start

📦 Model Registry

Hive ships with a curated registry of popular GGUF models. Use hive models to view their status:

Status	Model	Params	VRAM (Q4)	Notes
✓/◈/🔒	Qwen 2.5 7B Instruct	7B	~5 GB	Great for single GPU
✓/◈/🔒	Qwen 2.5 14B Instruct	14B	~9 GB	Fits 2× 6GB GPUs
✓/◈/🔒	Qwen 2.5 32B Instruct	32B	~20 GB	Needs 3+ nodes
✓/◈/🔒	Llama 3.1 8B Instruct	8B	~5 GB	Single GPU
✓/◈/🔒	Llama 3.1 70B Instruct	70B	~42 GB	Needs large cluster
✓/◈/🔒	DeepSeek R1 7B	7B	~5 GB	Reasoning model
✓/◈/🔒	DeepSeek R1 32B	32B	~20 GB	Reasoning model
✓/◈/🔒	Phi 4 14B	14B	~9 GB	Microsoft
✓/◈/🔒	Gemma 3 12B Instruct	12B	~8 GB	Google
✓/◈/🔒	Mistral 7B Instruct v0.3	7B	~5 GB	Fast & capable

Status legend: ✓ Available (downloaded) · ◈ Downloadable (fits cluster) · 🔒 Locked (needs more VRAM)

Download models with:

hive pull qwen2.5-7b                              # From registry
hive pull Qwen/Qwen2.5-7B-Instruct-GGUF           # Direct HuggingFace repo

Hive also auto-detects all models installed via Ollama — no additional setup needed.

🧪 Tech Stack

Layer	Technology	Purpose
CLI & Agent	Python, asyncio, Rich, prompt_toolkit	Terminal UI & agentic loop
LLM Backend	Ollama / llama.cpp	Local inference (single & distributed)
API Server	FastAPI, Uvicorn	OpenAI-compatible API + cluster management
Node Discovery	AsyncZeroconf (mDNS)	Zero-config LAN peer discovery
Authentication	HMAC-SHA256	Shared-secret cluster trust
Web Dashboard	React 19, Vite 6, React Router 7	Browser-based monitoring & chat
Session Store	SQLite	Conversation persistence
Model Watcher	Watchdog	Live GGUF file detection
Metrics	prometheus-client	Observability (tok/s, VRAM, latency)
Configuration	TOML + Pydantic	Type-safe settings

🗺️ Roadmap

🤝 Contributing

Contributions are welcome! This project is in active development.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

MIT License — see LICENSE for details.

Built with 🐝 by Dhruv Narang

Your code stays on your machine. Always.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local AI Coding Assistant & Distributed Inference Engine

🐝 What is Hive?

✨ Features

🤖 Agentic Tool Calling

⚡ Real-Time Streaming

🎨 Premium Terminal UI

💾 Session Persistence

📦 Built-in Model Management

🌐 Web Dashboard

🔗 Distributed Inference (Multi-Node)

📊 Observability

🛠️ Built-in Commands

🚀 Quick Start

Prerequisites

Install

Run

🏗️ Architecture

How the Agent Loop Works

How Distributed Inference Works

⚙️ Configuration

📦 Model Registry

🧪 Tech Stack

🗺️ Roadmap

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
cli		cli
config		config
coordinator		coordinator
models		models
ui		ui
worker		worker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hive_plan.docx		hive_plan.docx
plan.txt		plan.txt
pyproject.toml		pyproject.toml
test_ui.py		test_ui.py

Folders and files

Latest commit

History

Repository files navigation

Local AI Coding Assistant & Distributed Inference Engine

🐝 What is Hive?

✨ Features

🤖 Agentic Tool Calling

⚡ Real-Time Streaming

🎨 Premium Terminal UI

💾 Session Persistence

📦 Built-in Model Management

🌐 Web Dashboard

🔗 Distributed Inference (Multi-Node)

📊 Observability

🛠️ Built-in Commands

🚀 Quick Start

Prerequisites

Install

Run

🏗️ Architecture

How the Agent Loop Works

How Distributed Inference Works

⚙️ Configuration

📦 Model Registry

🧪 Tech Stack

🗺️ Roadmap

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages