Private, on-device AI. Your models, your data — no cloud, no accounts, no API keys.
A local-first AI runtime + studio — run open models (text, vision, image, voice, speech) entirely on your machine, behind one OpenAI-compatible gateway. Plus an always-on layer that sees, remembers, reflects, and acts — all on-device.
Download (macOS · Windows) · Features · getoffgridai.co · Pro early access
Off Grid AI is a local-first AI runtime for your desktop. Download open models from the
built-in catalog (or any GGUF from Hugging Face) and use them across every modality — all
inference runs on your hardware via bundled llama.cpp, stable-diffusion.cpp,
whisper.cpp, and Kokoro. Nothing routes through a server we own; your conversations,
files, and models never leave your device.
Three things in one app:
- A studio — chat (text + vision + reasoning), on-device image generation, voice in/out, live artifacts/canvas, projects with RAG, and in-chat tools — a local Claude/LM-Studio/Ollama with everything on-device.
- A gateway — one local OpenAI-compatible API (
http://127.0.0.1:7878/v1, no key) for chat, vision, image, audio, and embeddings. Run it headless as just the gateway. - Off Grid Pro — an always-on private layer that sees your work (screen → OCR), remembers it, helps you reflect, and acts with your approval. On-device, opt-in.
The free, open app is a complete on-device AI studio:
- Chat — text + vision, streaming, with a reasoning ("thinking") mode and per-chat model settings (temperature, context window).
- Image generation — text→image and image→image via
stable-diffusion.cpp(Metal). Ships SDXL-Lightning (few-step, fast), SDXL, SD 1.5/2.1, and Z-Image-Turbo (2026 flagship, ~8-step). Live per-step preview, progress + ETA, cancel, lightbox, and an artifacts gallery of everything you've generated. - Voice — speech-to-text (whisper) and text-to-speech (Kokoro-82M, multilingual), plus a hands-free voice mode.
- Artifacts / canvas — the model's HTML, React/JSX, SVG, Mermaid, and Markdown render live in a sandboxed iframe (no network/file access); Code/Preview toggle, download, saved per chat & project.
- Projects — group chats, upload documents (txt/md/PDF/DOCX, image, audio, video) and chat grounded in them (RAG with cited sources); per-project instructions.
- Tools in chat — an agentic loop calls local tools mid-conversation: built-ins (calculator, datetime) plus any MCP connector you've added.
- Connectors (MCP) — add Model Context Protocol servers (none / token / OAuth) and use them right inside chat. Preset catalog included.
- Model catalog — curated, size-bucketed recommendations + direct Hugging Face search; download, manage, and set the active model per modality.
- The Gateway — one OpenAI-compatible endpoint for everything; see below.
- Auto-update — signed releases update themselves.
A full breakdown is in docs/FEATURES.md.
One local server (http://127.0.0.1:7878) speaks the OpenAI API:
| Capability | Endpoint |
|---|---|
| Chat (text + vision) | POST /v1/chat/completions |
| Text → Image | POST /v1/images (/generations, /edits) |
| Speech → Text | POST /v1/audio/transcriptions |
| Text → Speech | POST /v1/audio/speech |
| Embeddings | POST /v1/embeddings |
| Models | GET /v1/models |
curl http://127.0.0.1:7878/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"local","messages":[{"role":"user","content":"Hello!"}]}'from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:7878/v1", api_key="not-needed")
print(client.chat.completions.create(model="local",
messages=[{"role":"user","content":"Hello!"}]).choices[0].message.content)Interactive API reference + an OpenAPI spec are served at /docs and /openapi.json.
You don't need the desktop UI to serve models — run only the gateway (no UI, no capture) and point any OpenAI client at it. Ideal for a server, a homelab box, or wiring local models into your own apps:
# from a built app
/Applications/Off\ Grid\ AI.app/Contents/MacOS/Off\ Grid\ AI --server-only
# or from source
OFFGRID_SERVER_ONLY=1 npm run gatewayIt's self-sufficient — manage models over HTTP, no UI required:
| Action | Endpoint |
|---|---|
| List the catalog | GET /v1/models/catalog |
| List installed | GET /v1/models/installed |
| Active model per modality | GET /v1/models/active |
| Pull a model | POST /v1/models/pull { "id": "…" } → poll GET /v1/models/pull/status?id=… |
| Activate a model | POST /v1/models/activate { "id": "…", "kind"?: "image|speech|transcription" } |
| Delete a model | POST /v1/models/delete { "id": "…" } |
# pull a model into a headless gateway, then chat
curl -X POST http://127.0.0.1:7878/v1/models/pull \
-H 'Content-Type: application/json' -d '{"id":"unsloth/gemma-4-E4B-it-GGUF"}'
curl -X POST http://127.0.0.1:7878/v1/models/activate \
-H 'Content-Type: application/json' -d '{"id":"unsloth/gemma-4-E4B-it-GGUF"}'The free app runs models. Pro adds the always-on layer that turns your own work into private, on-device memory — and an assistant that helps you act on it. Everything is explicit opt-in, with a visible recording indicator, and nothing leaves the device.
- Sees — screen capture → OCR → on-device LLM distill into observations + entities. Multi-monitor aware, consumption-vs-work classification, blank/locked frames skipped.
- Remembers — Day (a persisted journal with time blocks), Entities (a private CRM-for-everything: people, projects, companies, auto-built with synthesis summaries), and Replay (a "movie of your day" you can scrub and play back).
- Reflects — mind-share, balance, context-switching, and Day/Week trends.
- Meetings — records Google Meet + Zoom (screen + system audio + mic), transcribes locally with whisper, and folds an LLM title/summary/attendees into your timeline.
- Acts (with approval) — action items detected from your communication, an approval queue + audit log (nothing executes without a logged approval), MCP connectors as authoritative sources, and a skills framework (trigger → action) — on the roadmap toward a proactive secretary and a prospective "Ahead" view of your day.
Pro launches July 2026 — already paid? You're first in line when it ships. Pro features
live in a separate private package (a pro/ submodule); the open core never imports it —
see Architecture.
Pro screens shown with synthetic demo data.
→ Join early access (free) — or pay now for lifetime free + first access.
Grab the latest build from Releases:
- macOS (Apple Silicon) — signed + notarized
.dmg - Windows (x64) —
.exeinstaller
Linux (AppImage/deb) is in progress.
git clone https://github.com/off-grid-ai/desktop.git
cd desktop
npm install
npm run dev # full app
npm run gateway # headless gateway only (:7878)
npm run build:mac # package a macOS appStack: Electron 39 + React 19 + Tailwind v4 (electron-vite),
better-sqlite3-multiple-ciphers (encrypted local DB), @lancedb/lancedb (vectors),
bundled llama.cpp / whisper.cpp / stable-diffusion.cpp / ffmpeg in resources/bin.
Shared @offgrid/* packages (design, models, rag) come from the workspace. Verify changes
with npm run typecheck before declaring done.
This repository is the open, AGPL core: the model runner, gateway, studio (chat,
image, voice, artifacts), projects, connectors, and the model catalog. Pro features live in
a separate private package loaded as a git submodule (pro/). The core never imports
pro — pro registers itself through small registries (an activate() pattern) and is
simply absent in this build, so the open app compiles and runs entirely on its own.
All model inference is local. Your conversations, documents, and models stay on your device — there's no cloud inference, no account, and no API key. You can run it fully offline.
AGPL-3.0-only. © Off Grid AI / Wednesday Solutions, Inc.














