Self-hosted LLM infrastructure with OpenAI-compatible API. CPU-optimized. Source-available.
Product page · Wiki · Latest release · Changelog
Enclave runs LLMs on your hardware. OpenAI-compatible API, Ollama backend, zero cloud dependencies.
What's new — Architecture-aware orchestration (Phases 1–6): per-host detection of memory + deployment topology, four-tier
keep_aliveresolver with arch-detected defaults, scheduler facade with feasibility validation, and tick-based parallel DAG dispatch that uses the arch to decide what to run concurrently. Plus: installable Python wheel + sdist, mirrored Docker image on GHCR, Linux source tarball with SHA256/SHA512, n8n release-update workflow, and a curated Wiki seed. See the CHANGELOG for the full PR-by-PR detail.
- OpenAI-compatible API — drop-in replacement. Point your existing code at
localhost:8000 - CPU-optimized inference — GGUF quantized models via Ollama. 7B at 40-50 tok/s, 13B at 25-30 tok/s
- Model management — download, configure, and switch between 18+ models from the registry
- Multi-agent workflows — YAML-defined step pipelines with role-based model selection
- Web dashboard — monitor models, system health, and API status
- macOS app — native desktop wrapper with setup wizard
- No telemetry by default — no data leaves your machine unless you opt in; optional, operator-owned error reporting (your own sink, redaction mandatory — see docs/deployment/error-reporting.md). No internet required for inference
Three paths — pick one:
- Download Enclave.dmg from the latest release (Or grab the rolling nightly build for the freshest master.)
- Open the DMG and drag Enclave.app to
/Applications. - First launch: macOS Gatekeeper will warn — the app is currently not signed/notarized. Bypass once with:
Then double-click Enclave in Launchpad.
xattr -dr com.apple.quarantine /Applications/Enclave.app
- The native window opens the first-run setup wizard (
/setup) which installs Ollama if needed and pulls a starter model. After that you land on the dashboard.
Requirements: macOS 12.0 (Monterey) or later. ~6 GB free disk for the bundled runtime + a small starter model. Ollama is installed automatically by the wizard if missing.
For non-developers on Linux / Windows, or anyone who wants Enclave fully isolated in containers. No Python, no virtualenv, no manual Ollama install.
- Install Docker Desktop (or Docker Engine on Linux) and make sure the whale icon is running.
- Clone or download this repo, open a terminal in the project folder, and run:
./run.sh
- The script verifies Docker, brings up the stack (
ollama+api), pulls a small starter model on first run (llama3.2:3b, ~2 GB), and opens the dashboard in your browser.
| URL | |
|---|---|
| Enclave SPA (the application) | http://localhost:8000 |
| API docs | http://localhost:8000/docs |
| Open WebUI (opt-in) | http://localhost:8081 — docker compose -f docker-compose.yml -f docker-compose.webui.yml up -d |
To stop: ./stop.sh (data preserved) — or ./stop.sh --reset to wipe models and chat history.
Requirements: ~4 GB free RAM and ~3 GB free disk for the starter model. Pick a different starter with
ENCLAVE_DEFAULT_MODEL=qwen2.5:3b ./run.sh.
Prefer to pull the published image directly? (Substitute <version> with the latest tag.)
# Docker Hub — canonical
docker pull hankthebldrr/local-ai-platfrom:<version>
# GHCR mirror — same digest, no Hub account required
docker pull ghcr.io/hankthebldr/enclave:<version>For developers who want to use the Enclave engine inside another Python service. Bundles the FastAPI app, workflow engine, RAG pipeline, and CLI dispatcher.
# From a GitHub Release asset (no PyPI required)
pip install https://github.com/hankthebldr/local-ai-platform/releases/download/v<version>/enclave-<version>-py3-none-any.whl
# Then run the API server with the same uvicorn settings the DMG uses:
enclave-api # starts FastAPI on 127.0.0.1:8000
enclave --help # CLI dispatcher (chat, workflow, query, api)You still need an Ollama runtime reachable at OLLAMA_URL (defaults to http://localhost:11434). The Python package does not install Ollama for you — see the Wiki › Deployment page for production setups.
# Install (creates ./venv, installs core+dev deps, sets up systemd unit on Linux)
./setup/install.sh
# Boot Ollama + API + auto-open the dashboard in your browser
./scripts/start.sh
# Or, on macOS, exercise the same native pywebview window the DMG ships
./scripts/start_desktop.sh
# Verify everything boots and every UX route renders
./scripts/verify_local.shAPI at http://localhost:8000 · Dashboard at http://localhost:8000/ · Docs at http://localhost:8000/docs · First-run wizard at http://localhost:8000/setup.
# List available models
python models/download.py --list
# Download a model
python models/download.py dolphin-mixtral
# List installed
ollama listDefault quantization: Q4_K_M (best quality/speed balance). See MODELS.md for the full registry.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral",
"messages": [{"role": "user", "content": "Hello"}]
}'Compatible with any OpenAI SDK client.
What ships in this repo, and where to find it:
| Surface | Path | Notes |
|---|---|---|
| FastAPI server (OpenAI-compatible) | api/main.py | 16 routers under api/routers/, services under api/services/ |
| Web dashboard + setup wizard | api/static/ | Served at / and /setup by the FastAPI app |
| CLI chat / query / workflow | cli/ | Rich-formatted; python -m cli.chat, cli/workflow.py |
| Multi-agent workflow engine | api/services/workflow_engine.py | YAML pipelines under workflows/ |
| Custom agents (Gems) | agents/ + api/routers/agents.py | YAML-defined personas with pinned context |
| Model registry | models/download.py | 18+ models — see MODELS.md |
| macOS desktop wrapper | desktop/app.py | pywebview window around the FastAPI server |
| DMG builder | scripts/build_mac.sh | Bundles a self-contained .app + dmg |
| Local dev scripts | scripts/ | start.sh, start_desktop.sh, verify_local.sh, status.sh, test.sh |
The same script CI uses on tag pushes:
brew install librsvg create-dmg # one-time
./scripts/generate-icons.sh # regenerate icns from SVG
./scripts/build_mac.sh # produces dist/Enclave.app + dist/Enclave.dmg
open dist/Enclave.app # smoke-test the bundleThe build script reads ENCLAVE_VERSION (or falls back to git describe) and stamps it into Info.plist. Override for a one-off custom build:
ENCLAVE_VERSION=v1.2.3-local ./scripts/build_mac.sh| Trigger | Workflow | Artifact |
|---|---|---|
Tag push v*.*.* |
release.yml | Stable GitHub Release with signed DMG |
Push to master |
release.yml | Rolling nightly pre-release (replaced each merge) |
PR / push to master |
ci.yml | pytest + lint + macOS .app smoke build (boots and probes UX routes) |
| Tag push or release publish | pages.yml | Updates hankthebldr.github.io/local-ai-platform with the latest release version |
Every master merge re-publishes a freshly smoke-tested DMG to the nightly release. Stable releases are cut by pushing a vX.Y.Z tag.
| Machine | RAM | Role | Throughput |
|---|---|---|---|
| Mac M4 Pro | 48GB | Development | 7B @ 50 tok/s |
| MS-01 (Ryzen 9 7945HX) | 64GB | API serving | 34B @ 12 tok/s |
| BD790i (Ryzen 9 7945HX) | 96GB | Research / 70B-class workflows | 70B @ 5 tok/s |
The BD790i is the only host in the fleet that can exercise the full 1.3.0 MCP & Skills co-scheduler against 70B-class models + multi-GB MCP RSS simultaneously. Bring-up + benchmark recipes: docs/deployment/bd790i-testing.md.
The canonical operator-facing docs live on the GitHub Wiki (sourced from docs/wiki/ on every tag). Highlights:
- Quickstart — first 60 seconds
- Architecture — request flow, services, workflow engine, arch-aware dispatch
- Workflows — authoring YAML pipelines + composite step kinds
- Agents — Gems-style YAML personas
- Models — registry, quantization, throughput
- Deployment — DMG · Docker · pip · source · systemd
- Configuration — env vars, auth, CORS, perf knobs
- Troubleshooting — common failure modes
- Release notes
Source-of-truth references inside the repo:
- MODELS.md — model registry and selection
- CLAUDE.md — developer guide
- CHANGELOG.md — every release, every PR
- docs/ — design docs, plans, deployment guides
- Product page: hankthebldr.github.io/local-ai-platform