Four deployment methods: one-shot bootstrap (recommended for new installs), Docker Compose (manual), bare-metal, or K3s.
The install steps depend on your hardware + OS. Find the row that matches your setup, then jump to the linked section.
| Your hardware | OS | Recommended path |
|---|---|---|
| NVIDIA GPU (RTX 20/30/40/50, datacenter) | Linux | Method 0: bootstrap (auto-detects) or Method 1: Docker |
| NVIDIA GPU | Windows (WSL2) | Method 1: Docker — NVIDIA section |
| AMD GPU (RX 6000/7000, MI200+) | Linux | Method 1: Docker — AMD ROCm |
| Apple Silicon (M1/M2/M3/M4) | macOS | SETUP_MACOS.md (dedicated guide — hybrid native Metal + Docker) |
| Intel Arc / Iris Xe | Linux/Windows | Method 1: Docker — Vulkan |
| Snapdragon X Elite (laptops) | Windows on ARM / Linux | Method 1: Docker — Vulkan + arm64 section |
| Older AMD GPU (Vega, RDNA1, no ROCm 6.x support) | Linux | Method 1: Docker — Vulkan |
| NVIDIA on ARM64 (DGX Spark, Jetson) | Linux | Method 1: Docker — arm64 section (CUDA via sbsa/l4t base swap) |
| Raspberry Pi 5 | Linux | Method 1: Docker — Vulkan + arm64 (expect CPU-tier perf) |
| Intel Mac (pre-2020) | macOS | Method 1: Docker — Vulkan (Metal is Apple-Silicon-only) |
| CPU only, no GPU | any | Method 1: Docker — Vulkan + lavapipe (slow, smoke-test only) |
| Kubernetes cluster | Linux | Method 3: K3s |
| Bare-metal / development | Linux | Method 2: Bare Metal |
Don't see your setup? File an issue with uname -a output and lspci | grep -i vga (Linux) / system_profiler SPDisplaysDataType (Mac) and we'll add a row.
Single curl command that detects your distro, installs Docker + nvidia-container-toolkit, sets sysctl knobs, downloads model weights, and brings the stack up. Idempotent — safe to re-run.
curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh | bashOr, from a checkout:
./scripts/atlas-bootstrap.shSupported distributions:
| Family | Distros |
|---|---|
| Debian (apt-get) | Ubuntu 20.04+, Debian 11+ |
| RHEL (dnf) | RHEL 9+, Rocky 9+, AlmaLinux 9+, CentOS Stream 9+, Oracle Linux 9+ |
| Fedora (dnf) | Fedora 38+ |
Other distros with ID_LIKE matching one of the above (e.g. Linux Mint, Pop!_OS) are accepted with a warning. Distros not in this list — Arch, openSUSE, Alpine, NixOS — aren't tested and the script will refuse to run on them.
The bootstrap works around EPEL, firewalld, vm.overcommit_memory (PC-011), nouveau driver conflicts, the missing-libnvidia-ml.so.1 case (RHEL minimal installs), and the "user added to docker group but current shell doesn't see it yet" race.
Run modes — both work:
# Run as your normal user; sudo elevates as needed (Docker install, sysctl, etc).
# Install ends up owned by you.
curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh | bash
# Run via sudo. SUDO_USER is detected, install still ends up owned by you.
curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh | sudo bash
# Real root login (no sudo) — install owned by root. Only do this if there's
# no human user on the box (CI runner, container, etc).Configuration env vars:
| Flag | Effect |
|---|---|
ATLAS_BOOTSTRAP_SKIP_DOCKER=1 |
Don't install Docker (already managed) |
ATLAS_BOOTSTRAP_SKIP_NVIDIA=1 |
CPU-only install (no GPU steps) |
ATLAS_BOOTSTRAP_SKIP_MODELS=1 |
Don't download model weights |
ATLAS_BOOTSTRAP_SKIP_COMPOSE=1 |
Don't run docker compose up |
ATLAS_BOOTSTRAP_SKIP_SYSCTL=1 |
Don't write vm.overcommit_memory=1 (CI / unprivileged containers) |
ATLAS_BOOTSTRAP_SKIP_ASA=1 |
Skip the ASA steering-vector build (default: built ~5 min after services come up) |
ATLAS_BOOTSTRAP_NO_SUDO=1 |
Fail instead of attempting sudo |
ATLAS_INSTALL_DIR=/path |
Where to clone (default /opt/atlas — see below) |
ATLAS_REPO_URL=https://... |
Alternate repo URL |
Why /opt/atlas? It's the standard FHS prefix for system-wide third-party software, survives $HOME cleanup, and lets multiple users on the same box share one install. If you'd rather it land in your home dir:
curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh \
| ATLAS_INSTALL_DIR=$HOME/atlas bashWhen complete, prints a green "ATLAS ready" banner with quick-start commands. Total time on a fresh VM with a fast connection: ~10-30 minutes (model download dominates).
If you'd rather do each step manually, use Method 1 below.
| Requirement | Details |
|---|---|
| GPU | 16 GB+ VRAM. NVIDIA (CUDA) is the canonical path; AMD (ROCm) and Apple Silicon (Metal, macOS hybrid — see SETUP_MACOS.md) are both supported; Vulkan is the universal fallback; Intel Arc (SYCL) is roadmap. See § Supported GPUs. |
| GPU drivers | NVIDIA: proprietary drivers (nvidia-smi should show your GPU). AMD: amdgpu-dkms kernel driver (/dev/kfd must exist; rocm-smi should show your GPU). |
| Python 3.9+ | With pip |
| wget | For downloading model weights |
| Model weights | Qwen3.5-9B-Q6_K.gguf (~7 GB) from HuggingFace. Apple Silicon ≤16 GB: use Q4_K_M (~5 GB) instead. |
NVIDIA:
nvidia-smi
# Should show your GPU with driver version and VRAM
# If this fails, install NVIDIA proprietary drivers firstAMD:
rocm-smi --showproductname --showmeminfo vram
# Should show your GPU model and total VRAM
# If rocm-smi is missing or /dev/kfd doesn't exist, install ROCm:
# RHEL 9: sudo dnf install -y https://repo.radeon.com/amdgpu-install/6.2/rhel/9.4/amdgpu-install-6.2.60200-1.el9.noarch.rpm
# sudo amdgpu-install --usecase=dkms,rocm
# Ubuntu: Follow https://rocm.docs.amd.com/projects/install-on-linux/
# Then REBOOT.Easy mode — let atlas tier autodetect across vendors and tell you what it found:
pip install -e .
atlas tier # prints detected GPU, tier classification, recommended settings
atlas tier --json # machine-readable (used by atlas init wizard)This is the tested deployment method for V3.1.0+.
NVIDIA (CUDA):
- Docker with nvidia-container-toolkit, or Podman with the same toolkit
- ~20 GB disk space (model weights + container images)
AMD (ROCm, V3.1.1):
- Docker alone — ROCm doesn't need a separate container runtime; passthrough via
--device=/dev/kfd --device=/dev/driis enough - Your user must be in the
videoandrendergroups:sudo usermod -aG video,render $USER(then re-login) - ~22 GB disk space (ROCm image is ~2 GB larger than the CUDA equivalent)
# 1. Clone
git clone https://github.com/itigges22/ATLAS.git
cd ATLAS
# 2. Download model weights (~7GB)
mkdir -p models
wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q6_K.gguf \
-O models/Qwen3.5-9B-Q6_K.gguf
# 3. Install the ATLAS CLI (puts `atlas` in ~/.local/bin)
pip install --user -e .
# Make sure ~/.local/bin is on your PATH so `atlas` resolves:
case ":$PATH:" in *":$HOME/.local/bin:"*) ;; *)
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
;; esac
# 4. Install Go 1.24+ — required for the TUI client (atlas tui) and
# optional for the proxy (proxy builds automatically on first run if Go
# is present; otherwise it runs in Docker with file access limited to
# ATLAS_PROJECT_DIR). Quickest path:
mkdir -p /tmp/go-install && cd /tmp/go-install
curl -LO https://go.dev/dl/go1.24.0.linux-amd64.tar.gz
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.24.0.linux-amd64.tar.gz
echo 'export PATH="/usr/local/go/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
cd -
# Then build the TUI:
cd tui && go build -o ~/.local/bin/atlas-tui . && cd ..
# 5. Configure environment
cp .env.example .env
# Defaults work if your model is in ./models/ — edit .env only if you changed the path
# 6. Start all services (first run builds container images — this takes several minutes)
# NVIDIA hosts (default):
docker compose up -d # or: podman-compose up -d
# AMD ROCm hosts (V3.1.1):
docker compose -f docker-compose.yml -f docker-compose.rocm.yml up -d
# `atlas init` writes a marker comment into .env telling you which to use.
# 7. Verify everything is healthy (wait for all services to show "healthy")
docker compose ps
# 8. Start coding (from your project directory)
cd /path/to/your/project
atlasThe ROCm path is identical to NVIDIA except for these three points:
-
Bring up with both compose files (or let
atlas initdo it for you):docker compose -f docker-compose.yml -f docker-compose.rocm.yml up -d
The override switches the llama-server image to the ROCm build, swaps the NVIDIA driver request for
/dev/kfd+/dev/dripassthrough, and forcesATLAS_BACKEND=rocmso the entrypoint takes the HIP-tuning branch. -
No
nvidia-container-toolkit— ROCm doesn't need a separate container runtime, just kernel-level device access. Confirm your user is in the right groups:id -nG | tr ' ' '\n' | grep -E '^(render|video)$' # Should print both. If not: sudo usermod -aG video,render $USER # Then log out + back in (or: newgrp render)
-
GPU compute target. The default
Dockerfile.rocmbuild is a "fat" image covering RDNA3 (7000 series), RDNA2 (6000 series), and CDNA2 (MI200) —gfx1100;gfx1101;gfx1102;gfx1030;gfx90a. For a smaller image targeted at your specific GPU, setATLAS_GFX_TARGETbefore building:# Example: only build for RX 7900 XT/XTX ATLAS_GFX_TARGET=gfx1100 docker compose -f docker-compose.yml -f docker-compose.rocm.yml build llama-serverSee LLVM AMDGPU processor table for the gfx target of your card.
For "I have an unsupported GPU but ROCm sort-of works on it" cases (older Vega, RDNA1), see TROUBLESHOOTING.md § AMD GPU not detected for the ATLAS_HSA_OVERRIDE_GFX_VERSION workaround.
When the native vendor backend isn't packaged for your hardware (Intel Arc, Snapdragon Adreno, older AMD without ROCm 6.x, or some weird combo), Vulkan is the safety-net path. One Dockerfile, runs on basically everything — Mesa RADV (AMD), Mesa ANV (Intel), nvidia-container-toolkit (NVIDIA), MoltenVK (Apple via macOS Docker), Adreno (Snapdragon), and Mesa lavapipe (pure CPU fallback).
Tradeoff: typically 20–40% slower than tuned native backends. Use it when CUDA/ROCm isn't an option, or for "does ATLAS even boot on my weird laptop" smoke testing.
# Option A — let atlas init pick it for you
# (the wizard offers Vulkan when your GPU vendor's native backend isn't packaged,
# or run with --backend vulkan to force it regardless of vendor):
atlas init --backend vulkan
docker compose -f docker-compose.yml -f docker-compose.vulkan.yml up -d
# Option B — already-installed deployment, just switch the override file:
docker compose -f docker-compose.yml -f docker-compose.vulkan.yml up -d
# (the entrypoint dispatches on ATLAS_BACKEND, which the compose overlay
# sets to vulkan; .env's value is ignored when the overlay is in play)What's different from CUDA/ROCm:
- No vendor-specific kernel driver requirement. Vulkan ICDs live inside the image (
mesa-vulkan-driverscovers AMD/Intel/CPU; NVIDIA's ICD comes from the nvidia-container-toolkit mount). /dev/dripassthrough only — no/dev/kfd, no--gpus all(unless you're routing through the NVIDIA toolkit, in which case both still work).- Per-GPU selection via
ATLAS_VK_DEVICE_SELECTinstead ofCUDA_VISIBLE_DEVICES/HIP_VISIBLE_DEVICES. Format is Mesa-standard:"vendorID:deviceID"(hex) or a substring of the device name.GGML_VK_VISIBLE_DEVICES(numeric index) also works. atlas doctorruns a_check_vulkan_via_dockerprobe — but only whenATLAS_BACKEND=vulkanis set (otherwise it skips to keep CUDA/ROCm runs fast).
If you hit vulkaninfo showing only the llvmpipe CPU device when you expected a GPU, the kernel-side device passthrough failed — verify /dev/dri/renderD* exists on the host and your user is in the video + render groups (same as the ROCm requirement above).
ATLAS targets two CPU architectures: x86_64 (default, all backends available) and aarch64 (a subset of backends). Verify with atlas doctor — the arch check surfaces your architecture and which backends are available before the GPU check fires.
Backend availability by architecture:
| Backend | x86_64 | aarch64 | Notes |
|---|---|---|---|
| CUDA | yes (rockylinux9 base) | yes (sbsa or l4t base, build-arg swap) | DGX Spark = sbsa, Jetson = l4t |
| ROCm | yes | no | AMD has no arm64 ROCm release. Use Vulkan instead. |
| Vulkan | yes | yes (Mesa is multi-arch) | Universal fallback for all arm64 GPUs |
| CPU (lavapipe) | yes | yes | Slow but always works |
Targeted arm64 devices:
- NVIDIA DGX Spark (Grace-Blackwell GB10) — CUDA via sbsa base image, compute cap 12.0/12.1
- NVIDIA Jetson Orin / AGX / Nano — CUDA via l4t base image, compute cap 8.7
- Apple Silicon (M1/M2/M3/M4) — Vulkan via MoltenVK in Docker Desktop (slow path); native Metal install tracked at #32 for the fast path
- Snapdragon X Elite (Windows on ARM laptops) — Vulkan via the Adreno driver
- Raspberry Pi 5 — Vulkan via Mesa V3D driver, expect CPU-tier performance
- Ampere Altra / AWS Graviton workstations — Vulkan via lavapipe (CPU fallback, since no consumer arm64 dGPU yet)
Building the Vulkan image for arm64:
# Multi-arch build that produces a single image manifest covering both archs:
docker buildx build --platform linux/amd64,linux/arm64 \
-t atlas-llama-server:vulkan \
-f inference/Dockerfile.vulkan inference/Building the CUDA image for arm64 (DGX Spark example):
# Swap to the sbsa-capable ubuntu base, build with --platform linux/arm64:
docker buildx build --platform linux/arm64 \
--build-arg BUILDER_IMAGE=nvidia/cuda:12.9.0-devel-ubuntu22.04 \
--build-arg RUNTIME_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu22.04 \
-t atlas-llama-server:cuda-arm64 \
-f inference/Dockerfile.v31 inference/For Jetson, swap to nvcr.io/nvidia/l4t-jetpack:r36.3.0 in both build args (l4t ships JetPack + CUDA + cuDNN as one image).
Known gaps (#115 tracks these):
- No prebuilt arm64 images on GHCR yet — arm64 users must build locally with the recipes above. Prebuilt multi-arch images will land once at least one arm64 device has been validated end-to-end.
- Bootstrap installer (
scripts/atlas-install.sh) hasn't been audited for arm64 paths. - Hardware testing matrix is empty for all five target devices — early adopters with any of these please drop your
atlas doctoroutput andvulkaninfo --summaryon #115.
- Docker pulls 5 prebuilt container images from
ghcr.io/itigges22/atlas-{proxy,v3,lens,llama,sandbox}(PC-052, ~3 min on a fast connection — replaces the prior ~75 min from-source CUDA build). To build from source instead (dev workflow), rundocker compose buildbefore theupstep — see "Image source" below. - llama-server loads the 7GB model into GPU VRAM (~1-2 min)
- All services start health checks
- Once all 6 services (redis, llama-server, geometric-lens, v3-service, sandbox, atlas-proxy) report healthy,
atlasconnects and launches the Bubbletea TUI
Subsequent docker compose up -d starts are fast (seconds) since images are cached.
docker-compose.yml declares both image: (GHCR) and build: (local
Dockerfile) for every service. Compose's default behavior:
| Command | What it does |
|---|---|
docker compose up -d |
Pull image: if not in local cache, else reuse local |
docker compose pull |
Force pull latest tag from GHCR (overwrite local cache) |
docker compose build |
Build from Dockerfile (overrides GHCR image) |
docker compose up -d --build |
Always rebuild from source then start |
Tag pinning. The tag defaults to latest. To pin to a specific
version (recommended for production), set ATLAS_IMAGE_TAG in .env:
ATLAS_IMAGE_TAG=v1.0.0 # semver tag from a git release
ATLAS_IMAGE_TAG=sha-abc1234 # exact commit
ATLAS_IMAGE_TAG=dev # bleeding edge from dev branchAvailable tags are listed at https://github.com/itigges22/ATLAS/pkgs/container/atlas-proxy
(swap atlas-proxy for the other service names: atlas-v3,
atlas-lens, atlas-llama, atlas-sandbox).
Hitting
unauthorizedoncompose pull? GHCR packages are private by default. If a maintainer hasn't yet flipped a package to public visibility,compose pullfails withunauthorized. Two escapes: (a) authenticate to GHCR with a personal access token that hasread:packagesscope (echo $TOKEN | docker login ghcr.io -u $USERNAME --password-stdin), or (b) build from source withdocker compose buildand skip the pull entirely.
Dev workflow gotcha —
compose pulloverwrites local builds. Both the local-built image and the GHCR-pulled image share the same tag (ghcr.io/<owner>/atlas-<svc>:<tag>), sodocker compose pullwill REPLACE your locally-built image with the registry version and wipe your local changes. While iterating on a service, either skipcompose pullentirely (Docker won't auto-pull if a local image is present), or setATLAS_IMAGE_TAG=dev-local(any unpublished tag name) in.envso your local builds and the registry images live under different tags.
Forks: pointing compose at your own GHCR. If you've forked the repo and your build-images.yml workflow has published images to
ghcr.io/<your-username>/atlas-*, setATLAS_GHCR_OWNER=<your-username>in.envto pull your fork's images instead of upstream's.
The fastest way is atlas doctor — runs 22 checks across the host
environment, the docker stack, and a live model inference, and returns
exit 0 (healthy) / 1 (failures). This is also what atlas-bootstrap.sh
runs at the end of install.
atlas doctor # full check (~5–10s)
atlas doctor --quick # skip the e2e model inference (~2s)
atlas doctor --json # machine output, for scripts/CI
atlas doctor -v # verbose: show detail for each checkThe 22 checks (PC-053 base + later additions):
| Group | Check | What it confirms |
|---|---|---|
| Host | docker | daemon reachable |
| Host | compose | docker compose v2 installed |
| Host | nvidia | nvidia-container-toolkit can run nvidia-smi inside Docker |
| Host | vm.overcommit_memory | set to 1 (PC-011 — Redis AOF) |
| Host | model_file | Qwen3.5-9B-Q6_K.gguf exists and is > 100 MB |
| Host | lens_weights | cost_field.pt + metric_tensor.pt present |
| Host | asa_steering | ast_edit_steering.gguf present (BiasBusters #4 — warn-not-fail; ATLAS works without it, just unsteered ast_edit-vs-edit_file bias) |
| Host | tier_match | .env model selection matches host hardware (PC-055; warn on overshoot — OOM risk — pass on match or undershoot) |
| Host | tier_constraints | host CPU/RAM/disk meets the recommended tier minimums (PC-055.1 — catches "16 GB GPU but 8 GB RAM" mismatches) |
| Stack | container/redis, llama-server, geometric-lens, v3-service, sandbox, atlas-proxy | all 6 running and healthy |
| Stack | health/llama, lens, v3, sandbox, proxy | all 5 /health endpoints return ok |
| Stack | image_skew | all 5 atlas-* images on the same tag (PC-052) |
| End-to-end | e2e_smoke | live /v1/chat/completions round-trip to llama-server (--quick to skip) |
If you'd rather check by hand:
# Hit each health endpoint
curl -s http://localhost:8080/health | python3 -m json.tool # llama-server
curl -s http://localhost:8099/health | python3 -m json.tool # geometric-lens
curl -s http://localhost:8070/health | python3 -m json.tool # v3-service
curl -s http://localhost:30820/health | python3 -m json.tool # sandbox
curl -s http://localhost:8090/health | python3 -m json.tool # atlas-proxy
# Functional test
echo "Create hello.py that prints hello world" | atlasAll health endpoints should return {"status": "ok"} or {"status": "healthy"}.
Note: Plain
atlasin an interactive terminal launches the Bubbletea TUI for the full agent loop (tool calls, V3 pipeline, file read/write). Pipe mode (e.g. theecho | atlasform above) routes through the built-in/solveflow for scripted/one-shot use.
docker compose down # Stop all services (preserves images)
docker compose down --rmi all # Stop and remove images (next start rebuilds)docker compose logs -f llama-server # Follow llama-server logs
docker compose logs -f geometric-lens # Follow Lens logs
docker compose logs -f v3-service # Follow V3 pipeline logs
docker compose logs -f atlas-proxy # Follow proxy logs
docker compose logs -f sandbox # Follow sandbox logs
docker compose logs --tail 50 # Last 50 lines from all servicesgit pull
docker compose down
docker compose pull # grab fresh :latest images from GHCR
docker compose up -dRun all services as local processes without containers. Useful for development or systems where Docker isn't available.
| Requirement | Details |
|---|---|
| Go 1.24+ | For building atlas-proxy |
| llama.cpp | Built from source with CUDA (see llama.cpp build instructions) |
| Node.js 20+ | Required by sandbox for JavaScript/TypeScript execution |
| Rust | Required by sandbox for Rust execution |
# 1. Clone and install Python CLI
git clone https://github.com/itigges22/ATLAS.git
cd ATLAS
pip install -e .
# 2. Download model weights
mkdir -p models
wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q6_K.gguf \
-O models/Qwen3.5-9B-Q6_K.gguf
# 3. Build the proxy
cd proxy
go build -o ~/.local/bin/atlas-proxy-v2 .
cd ..
# 4. Install geometric-lens Python dependencies
pip install -r geometric-lens/requirements.txt
# 5. Install V3 service PyTorch (CPU only)
pip install torch --index-url https://download.pytorch.org/whl/cpu
# 6. Install sandbox dependencies
pip install fastapi uvicorn pylint pytest pydanticStart each service in a separate terminal (or use & and redirect to log files):
# Terminal 1: llama-server (GPU)
llama-server \
--model models/Qwen3.5-9B-Q6_K.gguf \
--host 0.0.0.0 --port 8080 \
--ctx-size 32768 --n-gpu-layers 99 --no-mmap
# Terminal 2: Geometric Lens
cd geometric-lens
LLAMA_URL=http://localhost:8080 \
LLAMA_EMBED_URL=http://localhost:8080 \
GEOMETRIC_LENS_ENABLED=true \
PROJECT_DATA_DIR=/tmp/atlas-projects \
python -m uvicorn main:app --host 0.0.0.0 --port 8099
# Terminal 3: V3 Pipeline
cd v3-service
ATLAS_INFERENCE_URL=http://localhost:8080 \
ATLAS_LENS_URL=http://localhost:8099 \
ATLAS_SANDBOX_URL=http://localhost:8020 \
python main.py
# Terminal 4: Sandbox
cd sandbox
python executor_server.py
# Terminal 5: atlas-proxy
ATLAS_PROXY_PORT=8090 \
ATLAS_INFERENCE_URL=http://localhost:8080 \
ATLAS_LLAMA_URL=http://localhost:8080 \
ATLAS_LENS_URL=http://localhost:8099 \
ATLAS_SANDBOX_URL=http://localhost:8020 \
ATLAS_V3_URL=http://localhost:8070 \
ATLAS_MODEL_NAME=Qwen3.5-9B-Q6_K \
atlas-proxy-v2Note: The sandbox listens on port 8020 in bare-metal mode (no Docker port remapping). The proxy's
ATLAS_SANDBOX_URLmust use port 8020, not 30820.
Alternatively, copy the launcher script to your PATH:
cp /path/to/atlas-launcher ~/.local/bin/atlas
chmod +x ~/.local/bin/atlas
atlas # Starts all missing services and launches the TUIThe launcher auto-detects which services are already running and starts only what's missing. If it detects a Docker Compose stack, it connects to that instead.
For production Kubernetes deployment with GPU scheduling, health probes, and resource limits.
| Requirement | Details |
|---|---|
| K3s | Single-node or multi-node cluster |
| NVIDIA GPU Operator or device plugin | GPU must be visible as nvidia.com/gpu resource |
| Helm | For GPU Operator installation |
| Podman or Docker | For building container images |
The install script handles the complete setup — K3s installation, GPU Operator, container builds, and deployment:
# 1. Configure
cp atlas.conf.example atlas.conf
# Edit atlas.conf: model paths, GPU layers, context size, NodePorts
# 2. Run the installer (requires root)
sudo scripts/install.shThe installer will:
- Check prerequisites (NVIDIA drivers, GPU VRAM, system RAM)
- Install K3s if not already running
- Install NVIDIA GPU Operator via Helm (if GPU not visible to cluster)
- Build container images and import to K3s containerd
- Generate manifests from
atlas.confvia envsubst - Deploy to the
atlasnamespace - Wait for all services to be healthy
If K3s is already running with GPU support:
# 1. Configure
cp atlas.conf.example atlas.conf
# Edit atlas.conf
# 2. Build and import images
scripts/build-containers.sh
# 3. Generate manifests from atlas.conf
scripts/generate-manifests.sh
# 4. Deploy
kubectl apply -n atlas -f manifests/
# 5. Verify
scripts/verify-install.shK3s uses atlas.conf (not .env) for configuration. The HTTP contracts and pipeline behavior are identical to Docker Compose; only deployment plumbing differs:
| Setting | Docker Compose | K3s |
|---|---|---|
| Config file | .env |
atlas.conf |
| Service exposure | Host ports (8090, 8080, 8099, 8070, 30820) |
NodePorts (30080, 32735, 31144, 30070, 30820) |
| Project workspace | Bind mount (ATLAS_PROJECT_DIR → /workspace) |
hostPath (ATLAS_PROJECTS_DIR → /workspace on every Pod that needs it) |
| Model files | Bind mount (ATLAS_MODELS_DIR → /models:ro) |
hostPath on the GPU node (ATLAS_MODELS_DIR, Directory, ro) |
| Stateful storage | Named volumes (redis-data, lens-data) |
PVCs (redis-data sized by ATLAS_PVC_REDIS_SIZE, lens-projects by ATLAS_PVC_PROJECTS_SIZE) |
| GPU allocation | deploy.resources.reservations.devices (nvidia) |
resources.limits.nvidia.com/gpu: 1 (requires GPU Operator or device plugin) |
| Sandbox toolchain caches | tmpfs mounts per language |
emptyDir with sizeLimit per language (PC-191 universal pattern, same set) |
Model + runtime parameters (ATLAS_MAIN_MODEL, ATLAS_CONTEXT_LENGTH, ATLAS_PARALLEL_SLOTS, ATLAS_FLASH_ATTENTION, KV cache quantization, --embeddings for the lens scoring path) all read from the same env vars in both modes — see atlas.conf.example and .env.example.
See CONFIGURATION.md for the full atlas.conf reference.
# Check pods
kubectl get pods -n atlas
# Check GPU allocation
kubectl describe nodes | grep nvidia.com/gpu
# Run verification suite
scripts/verify-install.shNote: Docker Compose is the most heavily-exercised deployment method (CI runs against it; every release is smoke-tested under Compose). K3s manifests are generated from
templates/*.yaml.tmplat deploy time viascripts/generate-manifests.sh(orinstall.sh'sprocess_templatesstep). Templates target the currentQwen3.5-9B-Q6_Kworking point and the May 2 2026 service layout (atlas-proxy, no api-portal, no dashboard); the V3.0 benchmark numbers in CHANGELOG were collected onQwen3-14Bunder an older topology.
ATLAS classifies GPUs into 5 tiers and recommends a model + context
size + parallel-slots configuration per tier. Run atlas tier to see
which tier your hardware lands in and the exact .env values to use.
| Tier | VRAM | Recommended model | Context | Slots | Example GPUs |
|---|---|---|---|---|---|
| cpu | n/a | not supported in v1 | n/a | n/a | (no CUDA GPU) |
| small | 8–12 GB | Qwen3.5 7B Q4_K_M (4.4 GB) | 8K | 1 | RTX 3060/4060 8GB, T4 |
| medium | 12–20 GB | Qwen3.5 9B Q6_K (6.9 GB) | 32K | 1 | RTX 4060/5060 Ti 16GB, 3080 Ti, 4070 Ti Super |
| large | 20–32 GB | Qwen3.5 14B Q5_K_M (10.5 GB) | 32K | 2 | RTX 3090, 4090, 5090 24GB |
| xlarge | 32 GB+ | Qwen3.5 32B Q5_K_M (23 GB) | 64K | 2 | RTX 5090 32GB, A6000, A100, H100 |
atlas tier # classify this host + show recommendations
atlas tier list # show all 5 tier definitions
atlas tier --json # machine output (for scripts)
atlas tier --raw # just the probe (no classification)The medium tier is the ATLAS development target — atlas-bootstrap.sh
defaults to its model+context settings. For other tiers, run
atlas init (the PC-054 first-run wizard) after the bootstrap
completes. It probes hardware via atlas tier, picks the right model
from the registry, downloads it with SHA verification, and rewrites
.env. Re-run with atlas init --reconfigure whenever your hardware
or model registry default changes.
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU VRAM | 8 GB | 16 GB | See tier table above |
| System RAM | 14 GB | 16 GB+ | PyTorch runtime + container overhead |
| Disk | 15 GB | 25 GB | Model (4.4–23 GB depending on tier) + container images (5–8 GB) + working space |
| CPU | 4 cores | 8+ cores | V3 pipeline is CPU-intensive during repair phases |
Any GPU with 8 GB+ VRAM and a llama.cpp-supported backend:
| Vendor | Backend | Status | Build path | Tested cards |
|---|---|---|---|---|
| NVIDIA | CUDA | Shipping (V3.1.0+) | inference/Dockerfile.v31 |
RTX 5060 Ti 16GB (primary dev) |
| AMD | ROCm / HIP | Shipping (V3.1.1) | inference/Dockerfile.rocm |
RX 7900 XTX (community smoke-test, GH #26) |
| Apple Silicon | Metal | Shipping (macOS hybrid: native llama-server + Docker, #32) | scripts/atlas-setup-macos.sh + docker-compose.macos.yml |
M2 Pro 32GB (verified); M3/M4 (target) |
| Intel Arc | SYCL | Roadmap | TBD | Arc A770 16GB (target) |
atlas tier auto-detects across vendors and picks the largest-VRAM GPU. Override with ATLAS_GPU_VENDOR=amd or ATLAS_GPU_INDEX=1 if you have multiple GPUs and want a specific one.
inference/Dockerfile.v31 compiles llama.cpp for a specific CUDA compute capability. The default is 120;121 (Blackwell, RTX 50xx). If you see build failures like nvcc fatal: unsupported gpu architecture or runtime errors like no kernel image available for execution, your GPU needs a different arch.
Override at build time with --build-arg CUDA_ARCH=<value>:
# Single arch — RTX 4060/4070/4080/4090 (Ada Lovelace)
podman build --build-arg CUDA_ARCH=89 -f inference/Dockerfile.v31 -t llama-server:local inference/
# Multiple archs (semicolon-separated) — build a fat binary for Ampere + Ada + Hopper
podman build --build-arg CUDA_ARCH="86;89;90" -f inference/Dockerfile.v31 -t llama-server:local inference/Common values:
| Arch | Architecture | Cards |
|---|---|---|
60, 61 |
Pascal | GTX 10xx, Tesla P4/P40 |
70 |
Volta | V100 |
75 |
Turing | RTX 20xx, T4 |
80, 86 |
Ampere | A100, RTX 30xx |
89 |
Ada Lovelace | RTX 40xx, L4 |
90 |
Hopper | H100 |
100, 120, 121 |
Blackwell | B100, RTX 50xx |
Your GPU's compute capability: nvidia-smi --query-gpu=compute_cap --format=csv (drop the dot — 8.9 → 89).
inference/Dockerfile.rocm compiles llama.cpp's HIP backend for one or more gfx targets. The default is a fat build covering the most common consumer + datacenter AMD GPUs: gfx1100;gfx1101;gfx1102;gfx1030;gfx90a. Each additional target adds ~150 MB to the binary.
Override at build time with --build-arg GFX_TARGET=<value> (or via ATLAS_GFX_TARGET env var, which the compose override forwards):
# Single target — RX 7900 XT/XTX only (smaller image)
ATLAS_GFX_TARGET=gfx1100 docker compose -f docker-compose.yml -f docker-compose.rocm.yml build llama-server
# Two targets for RDNA3 + RDNA2 mixed-fleet
docker build --build-arg GFX_TARGET="gfx1100;gfx1030" -f inference/Dockerfile.rocm -t atlas-llama-rocm:custom inference/Common values:
| Target | Architecture | Cards |
|---|---|---|
gfx1100 |
RDNA3 (Navi 31) | RX 7900 XT, 7900 XTX, 7900 GRE |
gfx1101 |
RDNA3 (Navi 32) | RX 7800 XT, 7700 XT |
gfx1102 |
RDNA3 (Navi 33) | RX 7600, 7600 XT |
gfx1030 |
RDNA2 (Navi 21) | RX 6800, 6800 XT, 6900 XT, 6950 XT |
gfx1031 |
RDNA2 (Navi 22) | RX 6700 XT, 6750 XT |
gfx1032 |
RDNA2 (Navi 23) | RX 6600, 6600 XT, 6650 XT |
gfx90a |
CDNA2 | MI210, MI250, MI250X |
gfx942 |
CDNA3 | MI300X |
gfx900 |
Vega | Vega 56/64 (may need HSA override — see TROUBLESHOOTING.md) |
gfx1200 |
RDNA4 (Navi 44) | RX 9070 |
gfx1201 |
RDNA4 (Navi 48) | RX 9070 XT |
RDNA4 (gfx1200/gfx1201) users: set
ATLAS_ROCM_TAG=7.2.3-complete— the default ROCm 6.2 base image does not include gfx1200/gfx1201 compiler support. ROCm 7.0+ supports these targets natively; do not setATLAS_HSA_OVERRIDE_GFX_VERSION. See TROUBLESHOOTING.md § RDNA4 for details.
Your GPU's gfx target: rocminfo | grep -i gfx | head -1 (or look it up in the LLVM AMDGPU processor table).
ATLAS works without Geometric Lens weights — the service degrades gracefully, returning neutral scores. The V3 pipeline falls back to sandbox-only verification.
To enable C(x)/G(x) scoring, you need trained model weights. Pre-trained weights and training data are available on HuggingFace:
ATLAS Dataset on HuggingFace — includes embeddings, training data, and weight files.
Place weight files in geometric-lens/geometric_lens/models/ (or mount via ATLAS_LENS_MODELS in Docker Compose). The service loads them automatically on startup.
Training scripts are provided in scripts/ if you want to train on your own benchmark data:
scripts/retrain_cx_phase0.py— Initial C(x) training from collected embeddingsscripts/retrain_cx.py— Production C(x) retraining with class weightsscripts/collect_lens_training_data.py— Collect pass/fail embeddings from benchmark runsscripts/prepare_lens_training.py— Prepare and validate training data format
If you want to swap in a non-default GGUF, the atlas lens subcommand wraps the probe + train pipeline so you don't have to learn the underlying scripts:
# 1. Drop your GGUF in models/ and update .env to point at it, restart llama-server.
# 2. Probe whether the existing artifacts can score it (cheap, no training):
atlas lens check
# Reports: compat (artifacts work) | needs-build (different dim) | incompatible
# 3. If 'needs-build', train fresh artifacts at the model's native embedding dim:
atlas lens build --samples path/to/labeled.json
# samples format: [{"text": str, "label": 0|1}, ...] where 1 = passing code
# Canonical training set: huggingface.co/datasets/itigges22/ATLAS
# 4. Re-run check — should now report compat:
atlas lens checkFull reference: CLI.md § atlas lens.
May 2026 BiasBusters #4. A residual-stream steering vector that biases
the model toward ast_edit over edit_file for whole-function /
class / element rewrites, applied before the grammar gate has a
chance to reject anything. Strictly optional — ATLAS continues to work
without it, just with an unsteered tool-selection bias.
atlas-bootstrap.sh builds it automatically as Step 8.5, after the
services come up. The pipeline is:
build_cvector_prompts.pyturns the committedgeometric-lens/asa_calibration/contrast_pairs.jsonl(1000 pairs) into positive / negative prompt files.- The bootstrap stops
llama-serverbriefly, runsllama-cvector-generatoras a one-shot container with--method mean -ngl 99, writesmodels/ast_edit_steering.gguf, then restartsllama-server. inference/entrypoint-v3.1-9b.shsees the file on the next start and appends--control-vector-scaled /models/ast_edit_steering.gguf:0.5to thellama-servercommand line.
Total wall time on a 16GB GPU: ~5 minutes. Build runs on the same
hardware the model lives on; the resulting vector is model-specific
(do not move an ast_edit_steering.gguf built against
Qwen3.5-9B-Q6_K to a host running a different base model).
Override behavior (set in .env if you want to tune):
| Env var | Default | Effect |
|---|---|---|
ATLAS_CONTROL_VECTOR |
/models/ast_edit_steering.gguf |
Override path |
ATLAS_CONTROL_VECTOR_SCALE |
0.5 |
Conservative. Bump to 1.0–1.5 if the bias is too subtle, drop toward 0.2 if non-tool tasks degrade. |
ATLAS_CONTROL_VECTOR_LAYER_RANGE |
(all layers) | Pass two integers, e.g. "24 30", to scope to a layer band. Narrower = safer but weaker. |
If the local build fails (e.g. cvector-generator missing in an
older atlas-llama image, GPU OOM, network hiccup pulling the
runtime), the bootstrap falls back to downloading a prebuilt
ast_edit_steering.gguf from the
ATLAS HuggingFace dataset.
If that also fails the install completes with a warning — atlas doctor will flag the gap as warn, not fail.
To skip the build entirely, set ATLAS_BOOTSTRAP_SKIP_ASA=1 before
running the installer.
To rebuild manually (re-curated pairs, different --method, different
base model), see
geometric-lens/asa_calibration/README.md.
- CLI.md — How to use ATLAS once it's running
- CONFIGURATION.md — All environment variables and tuning options
- TROUBLESHOOTING.md — Common issues and solutions
- ARCHITECTURE.md — How the system works internally