Skip to content

Dhruv546Narang/Hive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

 β–ˆβ–ˆβ•—  β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  
 β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β•  
 β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
 β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•β•β•  β•šβ•β•β•β•β•β•β•

Local AI Coding Assistant & Distributed Inference Engine

Like Claude Code, but runs entirely on your hardware.

Python 3.10+ Ollama llama.cpp License: MIT


πŸ”’ 100% Local Β· πŸ†“ Free Forever Β· ⚑ Real-Time Streaming Β· 🌐 Multi-Node Ready


🐝 What is Hive?

Hive is a fully local, privacy-first AI coding assistant that runs in your terminal. It connects to your local LLM (via Ollama or llama.cpp) and gives you an agentic coding experience β€” reading files, writing code, running commands, and searching your codebase β€” all without sending a single byte to the cloud.

It's also designed from the ground up for distributed inference: pool the VRAM and RAM of multiple laptops on your LAN to run models that no single machine could handle alone.


✨ Features

πŸ€– Agentic Tool Calling

Hive doesn't just chat β€” it acts. The LLM autonomously calls tools to complete multi-step tasks:

Tool Description
read_file Read any file in your project
write_file Create new files or overwrite existing ones
edit_file Targeted find-and-replace edits
run_command Execute shell commands (build, test, install)
list_directory Explore project structure with sizes
search_files Regex search across your codebase

The agent runs up to 20 autonomous tool-calling rounds per request, self-correcting on errors and chaining actions to complete complex tasks.

⚑ Real-Time Streaming

Responses stream token-by-token directly to your terminal with live Markdown rendering via Rich. No waiting for the full response β€” you see text appear in real-time.

🎨 Premium Terminal UI

  • ASCII art boot screen with model info, GPU/VRAM stats, git branch
  • Live Markdown rendering with syntax-highlighted code blocks
  • Purple neon theme with rich formatting throughout
  • Alternate screen buffer β€” clean start, terminal restored on exit
  • Bottom toolbar showing CWD and git status
  • Tool execution visualization with call/result indicators

πŸ’Ύ Session Persistence

Conversations are saved to a local SQLite database (~/.hive/sessions.db). Resume where you left off:

  • Automatic session creation and history tracking
  • Resume previous sessions on startup
  • Browse and switch between past sessions with /sessions and /resume

πŸ“¦ Built-in Model Management

  • Model registry with 10 popular models pre-configured (Qwen, Llama, DeepSeek, Phi, Gemma, Mistral)
  • Direct downloads from Hugging Face with resume support
  • Ollama integration β€” automatically detects and uses installed Ollama models
  • Hot-swap models mid-conversation with /model

🌐 Web Dashboard

A React-based dashboard (Vite + React Router) for monitoring and interaction:

  • Dashboard β€” cluster health, node hardware, VRAM/RAM bars, running models
  • Chat β€” browser-based chat interface
  • Models β€” view all available, downloadable, and locked models
  • Settings β€” cluster configuration

πŸ”— Distributed Inference (Multi-Node)

Pool GPU resources across multiple machines on your LAN:

  • mDNS auto-discovery β€” nodes find each other automatically via Zeroconf
  • HMAC-SHA256 authentication β€” shared secret for cluster trust
  • Automatic layer sharding β€” model layers distributed proportional to VRAM
  • llama.cpp RPC protocol β€” workers expose GPUs as remote compute devices
  • Auto-download of binaries β€” llama.cpp rpc-server and llama-server are fetched from GitHub releases

πŸ“Š Observability

  • Per-response stats: tool count, tokens, tok/s, latency
  • Prometheus metrics endpoint (/metrics): tok/s, VRAM/RAM usage, GPU temperature, inference latency histograms
  • Periodic hardware refresh with real-time GPU monitoring

πŸ› οΈ Built-in Commands

Command Description
/help Show all commands
/clear Clear conversation history
/compact Summarize history to save context window
/model <name> Hot-swap to a different model
/cwd [path] Print or change working directory
/tokens Session statistics (messages, tool calls, tokens)
/undo Revert the last file edit the agent made
/git <cmd> Run git commands (status, diff, branch)
/nodes Show live cluster node status
/sessions List past sessions
/resume <id> Resume a specific session
/paste Toggle multiline input mode
/exit Quit Hive

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Ollama installed and running (for single-node mode)
  • A model pulled: ollama pull qwen3.5 (or any model you prefer)

Install

git clone https://github.com/Dhruv546Narang/Hive.git
cd Hive
python -m venv .venv

# Windows
.\.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

pip install -e .

Run

# Start the interactive coding assistant (default)
hive chat

# Use a specific model
hive chat -m llama3.1

# Start the coordinator daemon (API + Web Dashboard)
hive start

# Check cluster status
hive status

# List all available models (registry + Ollama)
hive models

# Download a model from the registry
hive pull qwen2.5-7b

# Start a worker node (on another machine)
hive worker

Tip: Running hive with no arguments launches the chat assistant directly.


πŸ—οΈ Architecture

hive/
β”œβ”€β”€ cli/                          # Terminal UI & Chat
β”‚   β”œβ”€β”€ entry.py                  # CLI entrypoint & argument parser
β”‚   β”œβ”€β”€ chat.py                   # Main chat loop with streaming
β”‚   β”œβ”€β”€ commands.py               # Slash command handlers
β”‚   β”œβ”€β”€ display.py                # Rich console, boot screen, tables
β”‚   └── utils.py                  # GPU/RAM/Git detection, model loading
β”‚
β”œβ”€β”€ coordinator/                  # Coordinator Daemon
β”‚   β”œβ”€β”€ main.py                   # FastAPI app with lifespan management
β”‚   β”œβ”€β”€ router.py                 # API routes (OpenAI-compat, cluster, metrics)
β”‚   β”œβ”€β”€ agent.py                  # Agentic loop (prompt β†’ tools β†’ response)
β”‚   β”œβ”€β”€ tools.py                  # Tool definitions & executors
β”‚   β”œβ”€β”€ rpc_client.py             # Ollama / llama-server API client
β”‚   β”œβ”€β”€ inference.py              # Distributed llama-server manager
β”‚   β”œβ”€β”€ discovery.py              # mDNS broadcast & listener (AsyncZeroconf)
β”‚   β”œβ”€β”€ capacity.py               # Hardware detection (GPU, RAM, classification)
β”‚   β”œβ”€β”€ shard_planner.py          # Layer allocation for multi-node
β”‚   β”œβ”€β”€ model_downloader.py       # HuggingFace GGUF downloader with resume
β”‚   β”œβ”€β”€ model_watcher.py          # Filesystem watcher for ~/hive/models/
β”‚   β”œβ”€β”€ binary_manager.py         # Auto-download llama.cpp binaries
β”‚   β”œβ”€β”€ sessions.py               # SQLite session persistence
β”‚   β”œβ”€β”€ metrics.py                # Prometheus gauges, counters, histograms
β”‚   β”œβ”€β”€ config.py                 # TOML configuration (Pydantic)
β”‚   └── auth.py                   # HMAC-SHA256 cluster authentication
β”‚
β”œβ”€β”€ worker/                       # Worker Daemon
β”‚   β”œβ”€β”€ main.py                   # Worker entrypoint (mDNS + rpc-server)
β”‚   └── rpc_server.py             # llama.cpp rpc-server process manager
β”‚
β”œβ”€β”€ ui/                           # Web Dashboard (React + Vite)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ pages/                # Dashboard, Chat, Models, Settings
β”‚   β”‚   β”œβ”€β”€ components/           # Sidebar
β”‚   β”‚   β”œβ”€β”€ api.js                # API client
β”‚   β”‚   β”œβ”€β”€ App.jsx               # Router setup
β”‚   β”‚   └── index.css             # Global styles
β”‚   └── package.json
β”‚
β”œβ”€β”€ models/
β”‚   └── registry.json             # Pre-configured model registry (10 models)
β”‚
β”œβ”€β”€ config/
β”‚   └── default.toml              # Default configuration
β”‚
└── pyproject.toml                # Python package configuration

How the Agent Loop Works

User Input
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Build Messages  │◄── System prompt + conversation history + tool definitions
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LLM Inference   │◄── Ollama or llama-server (OpenAI-compatible API)
β”‚  (streaming)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
    Tool calls?
     β•±        β•²
   Yes          No
    β”‚            β”‚
    β–Ό            β–Ό
Execute        Stream text
tools          to terminal
    β”‚
    β–Ό
Append results
to history
    β”‚
    β–Ό
Loop back ──────► (up to 20 rounds)

How Distributed Inference Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Machine A       β”‚    β”‚   Machine B       β”‚    β”‚   Machine C       β”‚
β”‚   RTX 4050 6GB    │◄──►│   GTX 1660 6GB    │◄──►│   RTX 3060 8GB    β”‚
β”‚   Layers 0–12     β”‚    β”‚   Layers 13–22    β”‚    β”‚   Layers 23–32    β”‚
β”‚                   β”‚    β”‚                   β”‚    β”‚                   β”‚
β”‚   coordinator     β”‚    β”‚   worker          β”‚    β”‚   worker          β”‚
β”‚   + llama-server  β”‚    β”‚   + rpc-server    β”‚    β”‚   + rpc-server    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                                                β”‚
        └────────────────── LAN (mDNS) β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Workers start rpc-server to expose their GPU(s) as remote compute
  2. Workers broadcast their presence via mDNS
  3. Coordinator discovers workers, builds --rpc flag with all endpoints
  4. Coordinator starts llama-server which auto-distributes layers proportional to VRAM
  5. CLI/API talks to the coordinator's OpenAI-compatible endpoint
  6. Tokens stream back to the client in real-time

βš™οΈ Configuration

Hive loads configuration from ~/.hive/config.toml (falls back to config/default.toml):

cluster_secret = "your-secret-here"   # Shared secret for cluster auth
model_dir = "~/hive/models"           # GGUF model storage directory
coordinator_port = 8000               # Coordinator API port
worker_port = 50052                   # llama.cpp rpc-server port
inference_port = 8081                 # Distributed llama-server API port
offload_factor = 0.6                  # RAM usage factor (60% conservative)

Environment variables override config:

HIVE_COORDINATOR_PORT=9000 hive start

πŸ“¦ Model Registry

Hive ships with a curated registry of popular GGUF models. Use hive models to view their status:

Status Model Params VRAM (Q4) Notes
βœ“/β—ˆ/πŸ”’ Qwen 2.5 7B Instruct 7B ~5 GB Great for single GPU
βœ“/β—ˆ/πŸ”’ Qwen 2.5 14B Instruct 14B ~9 GB Fits 2Γ— 6GB GPUs
βœ“/β—ˆ/πŸ”’ Qwen 2.5 32B Instruct 32B ~20 GB Needs 3+ nodes
βœ“/β—ˆ/πŸ”’ Llama 3.1 8B Instruct 8B ~5 GB Single GPU
βœ“/β—ˆ/πŸ”’ Llama 3.1 70B Instruct 70B ~42 GB Needs large cluster
βœ“/β—ˆ/πŸ”’ DeepSeek R1 7B 7B ~5 GB Reasoning model
βœ“/β—ˆ/πŸ”’ DeepSeek R1 32B 32B ~20 GB Reasoning model
βœ“/β—ˆ/πŸ”’ Phi 4 14B 14B ~9 GB Microsoft
βœ“/β—ˆ/πŸ”’ Gemma 3 12B Instruct 12B ~8 GB Google
βœ“/β—ˆ/πŸ”’ Mistral 7B Instruct v0.3 7B ~5 GB Fast & capable

Status legend: βœ“ Available (downloaded) Β· β—ˆ Downloadable (fits cluster) Β· πŸ”’ Locked (needs more VRAM)

Download models with:

hive pull qwen2.5-7b                              # From registry
hive pull Qwen/Qwen2.5-7B-Instruct-GGUF           # Direct HuggingFace repo

Hive also auto-detects all models installed via Ollama β€” no additional setup needed.


πŸ§ͺ Tech Stack

Layer Technology Purpose
CLI & Agent Python, asyncio, Rich, prompt_toolkit Terminal UI & agentic loop
LLM Backend Ollama / llama.cpp Local inference (single & distributed)
API Server FastAPI, Uvicorn OpenAI-compatible API + cluster management
Node Discovery AsyncZeroconf (mDNS) Zero-config LAN peer discovery
Authentication HMAC-SHA256 Shared-secret cluster trust
Web Dashboard React 19, Vite 6, React Router 7 Browser-based monitoring & chat
Session Store SQLite Conversation persistence
Model Watcher Watchdog Live GGUF file detection
Metrics prometheus-client Observability (tok/s, VRAM, latency)
Configuration TOML + Pydantic Type-safe settings

πŸ—ΊοΈ Roadmap

  • Interactive CLI with premium TUI
  • Agentic tool calling (read, write, edit, run, search)
  • Real-time token streaming with Markdown rendering
  • Session persistence & resume (SQLite)
  • Slash commands (/help, /undo, /compact, /git, etc.)
  • Model registry with HuggingFace downloads
  • Ollama auto-detection
  • Web dashboard (React + Vite)
  • FastAPI coordinator with OpenAI-compatible API
  • mDNS node discovery
  • HMAC cluster authentication
  • Shard planner (layer allocation)
  • Auto-download llama.cpp binaries
  • Worker daemon with rpc-server management
  • Prometheus metrics endpoint
  • Context compaction (/compact)
  • Multi-node layer sharding (end-to-end testing)
  • Tab completion for commands and file paths
  • Image understanding (multimodal models)
  • Auto-commit with AI-generated messages
  • Plugin system for custom tools
  • Conversation export (Markdown/JSON)

🀝 Contributing

Contributions are welcome! This project is in active development.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

MIT License β€” see LICENSE for details.


Built with 🐝 by Dhruv Narang

Your code stays on your machine. Always.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors