ATLAS Setup Guide

Four deployment methods: one-shot bootstrap (recommended for new installs), Docker Compose (manual), bare-metal, or K3s.

Pick your install path

The install steps depend on your hardware + OS. Find the row that matches your setup, then jump to the linked section.

Your hardware	OS	Recommended path
NVIDIA GPU (RTX 20/30/40/50, datacenter)	Linux	Method 0: bootstrap (auto-detects) or Method 1: Docker
NVIDIA GPU	Windows (WSL2)	Method 1: Docker — NVIDIA section
AMD GPU (RX 6000/7000, MI200+)	Linux	Method 1: Docker — AMD ROCm
Apple Silicon (M1/M2/M3/M4)	macOS	SETUP_MACOS.md (dedicated guide — hybrid native Metal + Docker)
Intel Arc / Iris Xe	Linux/Windows	Method 1: Docker — Vulkan
Snapdragon X Elite (laptops)	Windows on ARM / Linux	Method 1: Docker — Vulkan + arm64 section
Older AMD GPU (Vega, RDNA1, no ROCm 6.x support)	Linux	Method 1: Docker — Vulkan
NVIDIA on ARM64 (DGX Spark, Jetson)	Linux	Method 1: Docker — arm64 section (CUDA via sbsa/l4t base swap)
Raspberry Pi 5	Linux	Method 1: Docker — Vulkan + arm64 (expect CPU-tier perf)
Intel Mac (pre-2020)	macOS	Method 1: Docker — Vulkan (Metal is Apple-Silicon-only)
CPU only, no GPU	any	Method 1: Docker — Vulkan + lavapipe (slow, smoke-test only)
Kubernetes cluster	Linux	Method 3: K3s
Bare-metal / development	Linux	Method 2: Bare Metal

Don't see your setup? File an issue with uname -a output and lspci | grep -i vga (Linux) / system_profiler SPDisplaysDataType (Mac) and we'll add a row.

Method 0: One-shot bootstrap (PC-051)

Single curl command that detects your distro, installs Docker + nvidia-container-toolkit, sets sysctl knobs, downloads model weights, and brings the stack up. Idempotent — safe to re-run.

curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh | bash

Or, from a checkout:

./scripts/atlas-bootstrap.sh

Supported distributions:

Family	Distros
Debian (apt-get)	Ubuntu 20.04+, Debian 11+
RHEL (dnf)	RHEL 9+, Rocky 9+, AlmaLinux 9+, CentOS Stream 9+, Oracle Linux 9+
Fedora (dnf)	Fedora 38+

Other distros with ID_LIKE matching one of the above (e.g. Linux Mint, Pop!_OS) are accepted with a warning. Distros not in this list — Arch, openSUSE, Alpine, NixOS — aren't tested and the script will refuse to run on them.

The bootstrap works around EPEL, firewalld, vm.overcommit_memory (PC-011), nouveau driver conflicts, the missing-libnvidia-ml.so.1 case (RHEL minimal installs), and the "user added to docker group but current shell doesn't see it yet" race.

Run modes — both work:

# Run as your normal user; sudo elevates as needed (Docker install, sysctl, etc).
# Install ends up owned by you.
curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh | bash

# Run via sudo. SUDO_USER is detected, install still ends up owned by you.
curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh | sudo bash

# Real root login (no sudo) — install owned by root. Only do this if there's
# no human user on the box (CI runner, container, etc).

Configuration env vars:

Flag	Effect
`ATLAS_BOOTSTRAP_SKIP_DOCKER=1`	Don't install Docker (already managed)
`ATLAS_BOOTSTRAP_SKIP_NVIDIA=1`	CPU-only install (no GPU steps)
`ATLAS_BOOTSTRAP_SKIP_MODELS=1`	Don't download model weights
`ATLAS_BOOTSTRAP_SKIP_COMPOSE=1`	Don't run `docker compose up`
`ATLAS_BOOTSTRAP_SKIP_SYSCTL=1`	Don't write `vm.overcommit_memory=1` (CI / unprivileged containers)
`ATLAS_BOOTSTRAP_SKIP_ASA=1`	Skip the ASA steering-vector build (default: built ~5 min after services come up)
`ATLAS_BOOTSTRAP_NO_SUDO=1`	Fail instead of attempting sudo
`ATLAS_INSTALL_DIR=/path`	Where to clone (default `/opt/atlas` — see below)
`ATLAS_REPO_URL=https://...`	Alternate repo URL

Why /opt/atlas? It's the standard FHS prefix for system-wide third-party software, survives $HOME cleanup, and lets multiple users on the same box share one install. If you'd rather it land in your home dir:

curl -fsSL https://raw.githubusercontent.com/itigges22/ATLAS/main/scripts/atlas-bootstrap.sh \
  | ATLAS_INSTALL_DIR=$HOME/atlas bash

When complete, prints a green "ATLAS ready" banner with quick-start commands. Total time on a fresh VM with a fast connection: ~10-30 minutes (model download dominates).

If you'd rather do each step manually, use Method 1 below.

Prerequisites (All Methods)

Requirement	Details
GPU	16 GB+ VRAM. NVIDIA (CUDA) is the canonical path; AMD (ROCm) and Apple Silicon (Metal, macOS hybrid — see SETUP_MACOS.md) are both supported; Vulkan is the universal fallback; Intel Arc (SYCL) is roadmap. See § Supported GPUs.
GPU drivers	NVIDIA: proprietary drivers (`nvidia-smi` should show your GPU). AMD: `amdgpu-dkms` kernel driver (`/dev/kfd` must exist; `rocm-smi` should show your GPU).
Python 3.9+	With pip
wget	For downloading model weights
Model weights	Qwen3.5-9B-Q6_K.gguf (~7 GB) from HuggingFace. Apple Silicon ≤16 GB: use Q4_K_M (~5 GB) instead.

Verify GPU

NVIDIA:

nvidia-smi
# Should show your GPU with driver version and VRAM
# If this fails, install NVIDIA proprietary drivers first

AMD:

rocm-smi --showproductname --showmeminfo vram
# Should show your GPU model and total VRAM
# If rocm-smi is missing or /dev/kfd doesn't exist, install ROCm:
#   RHEL 9: sudo dnf install -y https://repo.radeon.com/amdgpu-install/6.2/rhel/9.4/amdgpu-install-6.2.60200-1.el9.noarch.rpm
#           sudo amdgpu-install --usecase=dkms,rocm
#   Ubuntu: Follow https://rocm.docs.amd.com/projects/install-on-linux/
# Then REBOOT.

Easy mode — let atlas tier autodetect across vendors and tell you what it found:

pip install -e .
atlas tier              # prints detected GPU, tier classification, recommended settings
atlas tier --json       # machine-readable (used by atlas init wizard)

Method 1: Docker Compose (Recommended)

This is the tested deployment method for V3.1.0+.

Additional Prerequisites

NVIDIA (CUDA):

Docker with nvidia-container-toolkit, or Podman with the same toolkit
~20 GB disk space (model weights + container images)

AMD (ROCm, V3.1.1):

Docker alone — ROCm doesn't need a separate container runtime; passthrough via --device=/dev/kfd --device=/dev/dri is enough
Your user must be in the video and render groups: sudo usermod -aG video,render $USER (then re-login)
~22 GB disk space (ROCm image is ~2 GB larger than the CUDA equivalent)

Setup

# 1. Clone
git clone https://github.com/itigges22/ATLAS.git
cd ATLAS

# 2. Download model weights (~7GB)
mkdir -p models
wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q6_K.gguf \
     -O models/Qwen3.5-9B-Q6_K.gguf

# 3. Install the ATLAS CLI (puts `atlas` in ~/.local/bin)
pip install --user -e .

# Make sure ~/.local/bin is on your PATH so `atlas` resolves:
case ":$PATH:" in *":$HOME/.local/bin:"*) ;; *)
   echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
   source ~/.bashrc
;; esac

# 4. Install Go 1.24+ — required for the TUI client (atlas tui) and
#    optional for the proxy (proxy builds automatically on first run if Go
#    is present; otherwise it runs in Docker with file access limited to
#    ATLAS_PROJECT_DIR). Quickest path:
mkdir -p /tmp/go-install && cd /tmp/go-install
curl -LO https://go.dev/dl/go1.24.0.linux-amd64.tar.gz
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.24.0.linux-amd64.tar.gz
echo 'export PATH="/usr/local/go/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
cd -

# Then build the TUI:
cd tui && go build -o ~/.local/bin/atlas-tui . && cd ..

# 5. Configure environment
cp .env.example .env
# Defaults work if your model is in ./models/ — edit .env only if you changed the path

# 6. Start all services (first run builds container images — this takes several minutes)
#    NVIDIA hosts (default):
docker compose up -d                                                  # or: podman-compose up -d
#    AMD ROCm hosts (V3.1.1):
docker compose -f docker-compose.yml -f docker-compose.rocm.yml up -d
#    `atlas init` writes a marker comment into .env telling you which to use.

# 7. Verify everything is healthy (wait for all services to show "healthy")
docker compose ps

# 8. Start coding (from your project directory)
cd /path/to/your/project
atlas

AMD ROCm — what's different

The ROCm path is identical to NVIDIA except for these three points:

Bring up with both compose files (or let atlas init do it for you):
```
docker compose -f docker-compose.yml -f docker-compose.rocm.yml up -d
```
The override switches the llama-server image to the ROCm build, swaps the NVIDIA driver request for /dev/kfd + /dev/dri passthrough, and forces ATLAS_BACKEND=rocm so the entrypoint takes the HIP-tuning branch.

No nvidia-container-toolkit — ROCm doesn't need a separate container runtime, just kernel-level device access. Confirm your user is in the right groups:

id -nG | tr ' ' '\n' | grep -E '^(render|video)$'
# Should print both. If not:
sudo usermod -aG video,render $USER
# Then log out + back in (or: newgrp render)

GPU compute target. The default Dockerfile.rocm build is a "fat" image covering RDNA3 (7000 series), RDNA2 (6000 series), and CDNA2 (MI200) — gfx1100;gfx1101;gfx1102;gfx1030;gfx90a. For a smaller image targeted at your specific GPU, set ATLAS_GFX_TARGET before building:
```
# Example: only build for RX 7900 XT/XTX
ATLAS_GFX_TARGET=gfx1100 docker compose -f docker-compose.yml -f docker-compose.rocm.yml build llama-server
```
See LLVM AMDGPU processor table for the gfx target of your card.

For "I have an unsupported GPU but ROCm sort-of works on it" cases (older Vega, RDNA1), see TROUBLESHOOTING.md § AMD GPU not detected for the ATLAS_HSA_OVERRIDE_GFX_VERSION workaround.

Vulkan — the universal fallback (PC-114)

When the native vendor backend isn't packaged for your hardware (Intel Arc, Snapdragon Adreno, older AMD without ROCm 6.x, or some weird combo), Vulkan is the safety-net path. One Dockerfile, runs on basically everything — Mesa RADV (AMD), Mesa ANV (Intel), nvidia-container-toolkit (NVIDIA), MoltenVK (Apple via macOS Docker), Adreno (Snapdragon), and Mesa lavapipe (pure CPU fallback).

Tradeoff: typically 20–40% slower than tuned native backends. Use it when CUDA/ROCm isn't an option, or for "does ATLAS even boot on my weird laptop" smoke testing.

# Option A — let atlas init pick it for you
# (the wizard offers Vulkan when your GPU vendor's native backend isn't packaged,
#  or run with --backend vulkan to force it regardless of vendor):
atlas init --backend vulkan
docker compose -f docker-compose.yml -f docker-compose.vulkan.yml up -d

# Option B — already-installed deployment, just switch the override file:
docker compose -f docker-compose.yml -f docker-compose.vulkan.yml up -d
# (the entrypoint dispatches on ATLAS_BACKEND, which the compose overlay
#  sets to vulkan; .env's value is ignored when the overlay is in play)

What's different from CUDA/ROCm:

No vendor-specific kernel driver requirement. Vulkan ICDs live inside the image (mesa-vulkan-drivers covers AMD/Intel/CPU; NVIDIA's ICD comes from the nvidia-container-toolkit mount).
/dev/dri passthrough only — no /dev/kfd, no --gpus all (unless you're routing through the NVIDIA toolkit, in which case both still work).
Per-GPU selection via ATLAS_VK_DEVICE_SELECT instead of CUDA_VISIBLE_DEVICES / HIP_VISIBLE_DEVICES. Format is Mesa-standard: "vendorID:deviceID" (hex) or a substring of the device name. GGML_VK_VISIBLE_DEVICES (numeric index) also works.
atlas doctor runs a _check_vulkan_via_docker probe — but only when ATLAS_BACKEND=vulkan is set (otherwise it skips to keep CUDA/ROCm runs fast).

If you hit vulkaninfo showing only the llvmpipe CPU device when you expected a GPU, the kernel-side device passthrough failed — verify /dev/dri/renderD* exists on the host and your user is in the video + render groups (same as the ROCm requirement above).

arm64 hosts (#115) {#arm64}

ATLAS targets two CPU architectures: x86_64 (default, all backends available) and aarch64 (a subset of backends). Verify with atlas doctor — the arch check surfaces your architecture and which backends are available before the GPU check fires.

Backend availability by architecture:

Backend	x86_64	aarch64	Notes
CUDA	yes (rockylinux9 base)	yes (sbsa or l4t base, build-arg swap)	DGX Spark = sbsa, Jetson = l4t
ROCm	yes	no	AMD has no arm64 ROCm release. Use Vulkan instead.
Vulkan	yes	yes (Mesa is multi-arch)	Universal fallback for all arm64 GPUs
CPU (lavapipe)	yes	yes	Slow but always works

Targeted arm64 devices:

NVIDIA DGX Spark (Grace-Blackwell GB10) — CUDA via sbsa base image, compute cap 12.0/12.1
NVIDIA Jetson Orin / AGX / Nano — CUDA via l4t base image, compute cap 8.7
Apple Silicon (M1/M2/M3/M4) — Vulkan via MoltenVK in Docker Desktop (slow path); native Metal install tracked at #32 for the fast path
Snapdragon X Elite (Windows on ARM laptops) — Vulkan via the Adreno driver
Raspberry Pi 5 — Vulkan via Mesa V3D driver, expect CPU-tier performance
Ampere Altra / AWS Graviton workstations — Vulkan via lavapipe (CPU fallback, since no consumer arm64 dGPU yet)

Building the Vulkan image for arm64:

# Multi-arch build that produces a single image manifest covering both archs:
docker buildx build --platform linux/amd64,linux/arm64 \
  -t atlas-llama-server:vulkan \
  -f inference/Dockerfile.vulkan inference/

Building the CUDA image for arm64 (DGX Spark example):

# Swap to the sbsa-capable ubuntu base, build with --platform linux/arm64:
docker buildx build --platform linux/arm64 \
  --build-arg BUILDER_IMAGE=nvidia/cuda:12.9.0-devel-ubuntu22.04 \
  --build-arg RUNTIME_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu22.04 \
  -t atlas-llama-server:cuda-arm64 \
  -f inference/Dockerfile.v31 inference/

For Jetson, swap to nvcr.io/nvidia/l4t-jetpack:r36.3.0 in both build args (l4t ships JetPack + CUDA + cuDNN as one image).

Known gaps (#115 tracks these):

No prebuilt arm64 images on GHCR yet — arm64 users must build locally with the recipes above. Prebuilt multi-arch images will land once at least one arm64 device has been validated end-to-end.
Bootstrap installer (scripts/atlas-install.sh) hasn't been audited for arm64 paths.
Hardware testing matrix is empty for all five target devices — early adopters with any of these please drop your atlas doctor output and vulkaninfo --summary on #115.

What Happens on First Run

Docker pulls 5 prebuilt container images from ghcr.io/itigges22/atlas-{proxy,v3,lens,llama,sandbox} (PC-052, ~3 min on a fast connection — replaces the prior ~75 min from-source CUDA build). To build from source instead (dev workflow), run docker compose build before the up step — see "Image source" below.
llama-server loads the 7GB model into GPU VRAM (~1-2 min)
All services start health checks
Once all 6 services (redis, llama-server, geometric-lens, v3-service, sandbox, atlas-proxy) report healthy, atlas connects and launches the Bubbletea TUI

Subsequent docker compose up -d starts are fast (seconds) since images are cached.

Image source: prebuilt vs from-source

docker-compose.yml declares both image: (GHCR) and build: (local Dockerfile) for every service. Compose's default behavior:

Command	What it does
`docker compose up -d`	Pull `image:` if not in local cache, else reuse local
`docker compose pull`	Force pull latest tag from GHCR (overwrite local cache)
`docker compose build`	Build from `Dockerfile` (overrides GHCR image)
`docker compose up -d --build`	Always rebuild from source then start

Tag pinning. The tag defaults to latest. To pin to a specific version (recommended for production), set ATLAS_IMAGE_TAG in .env:

ATLAS_IMAGE_TAG=v1.0.0      # semver tag from a git release
ATLAS_IMAGE_TAG=sha-abc1234  # exact commit
ATLAS_IMAGE_TAG=dev          # bleeding edge from dev branch

Available tags are listed at https://github.com/itigges22/ATLAS/pkgs/container/atlas-proxy (swap atlas-proxy for the other service names: atlas-v3, atlas-lens, atlas-llama, atlas-sandbox).

Hitting unauthorized on compose pull? GHCR packages are private by default. If a maintainer hasn't yet flipped a package to public visibility, compose pull fails with unauthorized. Two escapes: (a) authenticate to GHCR with a personal access token that has read:packages scope (echo $TOKEN | docker login ghcr.io -u $USERNAME --password-stdin), or (b) build from source with docker compose build and skip the pull entirely.

Dev workflow gotcha — compose pull overwrites local builds. Both the local-built image and the GHCR-pulled image share the same tag (ghcr.io/<owner>/atlas-<svc>:<tag>), so docker compose pull will REPLACE your locally-built image with the registry version and wipe your local changes. While iterating on a service, either skip compose pull entirely (Docker won't auto-pull if a local image is present), or set ATLAS_IMAGE_TAG=dev-local (any unpublished tag name) in .env so your local builds and the registry images live under different tags.

Forks: pointing compose at your own GHCR. If you've forked the repo and your build-images.yml workflow has published images to ghcr.io/<your-username>/atlas-*, set ATLAS_GHCR_OWNER=<your-username> in .env to pull your fork's images instead of upstream's.

Verify Installation

The fastest way is atlas doctor — runs 22 checks across the host environment, the docker stack, and a live model inference, and returns exit 0 (healthy) / 1 (failures). This is also what atlas-bootstrap.sh runs at the end of install.

atlas doctor              # full check (~5–10s)
atlas doctor --quick      # skip the e2e model inference (~2s)
atlas doctor --json       # machine output, for scripts/CI
atlas doctor -v           # verbose: show detail for each check

The 22 checks (PC-053 base + later additions):

Group	Check	What it confirms
Host	docker	daemon reachable
Host	compose	docker compose v2 installed
Host	nvidia	nvidia-container-toolkit can run nvidia-smi inside Docker
Host	vm.overcommit_memory	set to 1 (PC-011 — Redis AOF)
Host	model_file	`Qwen3.5-9B-Q6_K.gguf` exists and is > 100 MB
Host	lens_weights	`cost_field.pt` + `metric_tensor.pt` present
Host	asa_steering	`ast_edit_steering.gguf` present (BiasBusters #4 — warn-not-fail; ATLAS works without it, just unsteered ast_edit-vs-edit_file bias)
Host	tier_match	`.env` model selection matches host hardware (PC-055; warn on overshoot — OOM risk — pass on match or undershoot)
Host	tier_constraints	host CPU/RAM/disk meets the recommended tier minimums (PC-055.1 — catches "16 GB GPU but 8 GB RAM" mismatches)
Stack	container/redis, llama-server, geometric-lens, v3-service, sandbox, atlas-proxy	all 6 running and healthy
Stack	health/llama, lens, v3, sandbox, proxy	all 5 `/health` endpoints return ok
Stack	image_skew	all 5 `atlas-*` images on the same tag (PC-052)
End-to-end	e2e_smoke	live `/v1/chat/completions` round-trip to llama-server (`--quick` to skip)

If you'd rather check by hand:

# Hit each health endpoint
curl -s http://localhost:8080/health | python3 -m json.tool   # llama-server
curl -s http://localhost:8099/health | python3 -m json.tool   # geometric-lens
curl -s http://localhost:8070/health | python3 -m json.tool   # v3-service
curl -s http://localhost:30820/health | python3 -m json.tool  # sandbox
curl -s http://localhost:8090/health | python3 -m json.tool   # atlas-proxy

# Functional test
echo "Create hello.py that prints hello world" | atlas

All health endpoints should return {"status": "ok"} or {"status": "healthy"}.

Note: Plain atlas in an interactive terminal launches the Bubbletea TUI for the full agent loop (tool calls, V3 pipeline, file read/write). Pipe mode (e.g. the echo | atlas form above) routes through the built-in /solve flow for scripted/one-shot use.

Stopping

docker compose down          # Stop all services (preserves images)
docker compose down --rmi all  # Stop and remove images (next start rebuilds)

Viewing Logs

docker compose logs -f llama-server    # Follow llama-server logs
docker compose logs -f geometric-lens  # Follow Lens logs
docker compose logs -f v3-service      # Follow V3 pipeline logs
docker compose logs -f atlas-proxy     # Follow proxy logs
docker compose logs -f sandbox         # Follow sandbox logs
docker compose logs --tail 50          # Last 50 lines from all services

Updating

git pull
docker compose down
docker compose pull          # grab fresh :latest images from GHCR
docker compose up -d

Method 2: Bare Metal

Run all services as local processes without containers. Useful for development or systems where Docker isn't available.

Additional Prerequisites

Requirement	Details
Go 1.24+	For building atlas-proxy
llama.cpp	Built from source with CUDA (see llama.cpp build instructions)
Node.js 20+	Required by sandbox for JavaScript/TypeScript execution
Rust	Required by sandbox for Rust execution

Build

# 1. Clone and install Python CLI
git clone https://github.com/itigges22/ATLAS.git
cd ATLAS
pip install -e .

# 2. Download model weights
mkdir -p models
wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q6_K.gguf \
     -O models/Qwen3.5-9B-Q6_K.gguf

# 3. Build the proxy
cd proxy
go build -o ~/.local/bin/atlas-proxy-v2 .
cd ..

# 4. Install geometric-lens Python dependencies
pip install -r geometric-lens/requirements.txt

# 5. Install V3 service PyTorch (CPU only)
pip install torch --index-url https://download.pytorch.org/whl/cpu

# 6. Install sandbox dependencies
pip install fastapi uvicorn pylint pytest pydantic

Start Services

Start each service in a separate terminal (or use & and redirect to log files):

# Terminal 1: llama-server (GPU)
llama-server \
  --model models/Qwen3.5-9B-Q6_K.gguf \
  --host 0.0.0.0 --port 8080 \
  --ctx-size 32768 --n-gpu-layers 99 --no-mmap

# Terminal 2: Geometric Lens
cd geometric-lens
LLAMA_URL=http://localhost:8080 \
LLAMA_EMBED_URL=http://localhost:8080 \
GEOMETRIC_LENS_ENABLED=true \
PROJECT_DATA_DIR=/tmp/atlas-projects \
python -m uvicorn main:app --host 0.0.0.0 --port 8099

# Terminal 3: V3 Pipeline
cd v3-service
ATLAS_INFERENCE_URL=http://localhost:8080 \
ATLAS_LENS_URL=http://localhost:8099 \
ATLAS_SANDBOX_URL=http://localhost:8020 \
python main.py

# Terminal 4: Sandbox
cd sandbox
python executor_server.py

# Terminal 5: atlas-proxy
ATLAS_PROXY_PORT=8090 \
ATLAS_INFERENCE_URL=http://localhost:8080 \
ATLAS_LLAMA_URL=http://localhost:8080 \
ATLAS_LENS_URL=http://localhost:8099 \
ATLAS_SANDBOX_URL=http://localhost:8020 \
ATLAS_V3_URL=http://localhost:8070 \
ATLAS_MODEL_NAME=Qwen3.5-9B-Q6_K \
atlas-proxy-v2

Note: The sandbox listens on port 8020 in bare-metal mode (no Docker port remapping). The proxy's ATLAS_SANDBOX_URL must use port 8020, not 30820.

Start with the Launcher Script

Alternatively, copy the launcher script to your PATH:

cp /path/to/atlas-launcher ~/.local/bin/atlas
chmod +x ~/.local/bin/atlas
atlas    # Starts all missing services and launches the TUI

The launcher auto-detects which services are already running and starts only what's missing. If it detects a Docker Compose stack, it connects to that instead.

Method 3: K3s

For production Kubernetes deployment with GPU scheduling, health probes, and resource limits.

Additional Prerequisites

Requirement	Details
K3s	Single-node or multi-node cluster
NVIDIA GPU Operator or device plugin	GPU must be visible as `nvidia.com/gpu` resource
Helm	For GPU Operator installation
Podman or Docker	For building container images

Automated Install

The install script handles the complete setup — K3s installation, GPU Operator, container builds, and deployment:

# 1. Configure
cp atlas.conf.example atlas.conf
# Edit atlas.conf: model paths, GPU layers, context size, NodePorts

# 2. Run the installer (requires root)
sudo scripts/install.sh

The installer will:

Check prerequisites (NVIDIA drivers, GPU VRAM, system RAM)
Install K3s if not already running
Install NVIDIA GPU Operator via Helm (if GPU not visible to cluster)
Build container images and import to K3s containerd
Generate manifests from atlas.conf via envsubst
Deploy to the atlas namespace
Wait for all services to be healthy

Manual Deploy

If K3s is already running with GPU support:

# 1. Configure
cp atlas.conf.example atlas.conf
# Edit atlas.conf

# 2. Build and import images
scripts/build-containers.sh

# 3. Generate manifests from atlas.conf
scripts/generate-manifests.sh

# 4. Deploy
kubectl apply -n atlas -f manifests/

# 5. Verify
scripts/verify-install.sh

K3s-Specific Configuration

K3s uses atlas.conf (not .env) for configuration. The HTTP contracts and pipeline behavior are identical to Docker Compose; only deployment plumbing differs:

Setting	Docker Compose	K3s
Config file	`.env`	`atlas.conf`
Service exposure	Host ports (`8090`, `8080`, `8099`, `8070`, `30820`)	NodePorts (`30080`, `32735`, `31144`, `30070`, `30820`)
Project workspace	Bind mount (`ATLAS_PROJECT_DIR` → `/workspace`)	`hostPath` (`ATLAS_PROJECTS_DIR` → `/workspace` on every Pod that needs it)
Model files	Bind mount (`ATLAS_MODELS_DIR` → `/models:ro`)	`hostPath` on the GPU node (`ATLAS_MODELS_DIR`, `Directory`, ro)
Stateful storage	Named volumes (`redis-data`, `lens-data`)	PVCs (`redis-data` sized by `ATLAS_PVC_REDIS_SIZE`, `lens-projects` by `ATLAS_PVC_PROJECTS_SIZE`)
GPU allocation	`deploy.resources.reservations.devices` (nvidia)	`resources.limits.nvidia.com/gpu: 1` (requires GPU Operator or device plugin)
Sandbox toolchain caches	`tmpfs` mounts per language	`emptyDir` with `sizeLimit` per language (PC-191 universal pattern, same set)

Model + runtime parameters (ATLAS_MAIN_MODEL, ATLAS_CONTEXT_LENGTH, ATLAS_PARALLEL_SLOTS, ATLAS_FLASH_ATTENTION, KV cache quantization, --embeddings for the lens scoring path) all read from the same env vars in both modes — see atlas.conf.example and .env.example.

See CONFIGURATION.md for the full atlas.conf reference.

Verify K3s Deployment

# Check pods
kubectl get pods -n atlas

# Check GPU allocation
kubectl describe nodes | grep nvidia.com/gpu

# Run verification suite
scripts/verify-install.sh

Note: Docker Compose is the most heavily-exercised deployment method (CI runs against it; every release is smoke-tested under Compose). K3s manifests are generated from templates/*.yaml.tmpl at deploy time via scripts/generate-manifests.sh (or install.sh's process_templates step). Templates target the current Qwen3.5-9B-Q6_K working point and the May 2 2026 service layout (atlas-proxy, no api-portal, no dashboard); the V3.0 benchmark numbers in CHANGELOG were collected on Qwen3-14B under an older topology.

Hardware Sizing

ATLAS classifies GPUs into 5 tiers and recommends a model + context size + parallel-slots configuration per tier. Run atlas tier to see which tier your hardware lands in and the exact .env values to use.

Tier	VRAM	Recommended model	Context	Slots	Example GPUs
cpu	n/a	not supported in v1	n/a	n/a	(no CUDA GPU)
small	8–12 GB	Qwen3.5 7B Q4_K_M (4.4 GB)	8K	1	RTX 3060/4060 8GB, T4
medium	12–20 GB	Qwen3.5 9B Q6_K (6.9 GB)	32K	1	RTX 4060/5060 Ti 16GB, 3080 Ti, 4070 Ti Super
large	20–32 GB	Qwen3.5 14B Q5_K_M (10.5 GB)	32K	2	RTX 3090, 4090, 5090 24GB
xlarge	32 GB+	Qwen3.5 32B Q5_K_M (23 GB)	64K	2	RTX 5090 32GB, A6000, A100, H100

atlas tier              # classify this host + show recommendations
atlas tier list         # show all 5 tier definitions
atlas tier --json       # machine output (for scripts)
atlas tier --raw        # just the probe (no classification)

The medium tier is the ATLAS development target — atlas-bootstrap.sh defaults to its model+context settings. For other tiers, run atlas init (the PC-054 first-run wizard) after the bootstrap completes. It probes hardware via atlas tier, picks the right model from the registry, downloads it with SHA verification, and rewrites .env. Re-run with atlas init --reconfigure whenever your hardware or model registry default changes.

Resource	Minimum	Recommended	Notes
GPU VRAM	8 GB	16 GB	See tier table above
System RAM	14 GB	16 GB+	PyTorch runtime + container overhead
Disk	15 GB	25 GB	Model (4.4–23 GB depending on tier) + container images (5–8 GB) + working space
CPU	4 cores	8+ cores	V3 pipeline is CPU-intensive during repair phases

Supported GPUs

Any GPU with 8 GB+ VRAM and a llama.cpp-supported backend:

Vendor	Backend	Status	Build path	Tested cards
NVIDIA	CUDA	Shipping (V3.1.0+)	`inference/Dockerfile.v31`	RTX 5060 Ti 16GB (primary dev)
AMD	ROCm / HIP	Shipping (V3.1.1)	`inference/Dockerfile.rocm`	RX 7900 XTX (community smoke-test, GH #26)
Apple Silicon	Metal	Shipping (macOS hybrid: native llama-server + Docker, #32)	`scripts/atlas-setup-macos.sh` + `docker-compose.macos.yml`	M2 Pro 32GB (verified); M3/M4 (target)
Intel Arc	SYCL	Roadmap	TBD	Arc A770 16GB (target)

atlas tier auto-detects across vendors and picks the largest-VRAM GPU. Override with ATLAS_GPU_VENDOR=amd or ATLAS_GPU_INDEX=1 if you have multiple GPUs and want a specific one.

CUDA Compute Capability (Dockerfile.v31)

inference/Dockerfile.v31 compiles llama.cpp for a specific CUDA compute capability. The default is 120;121 (Blackwell, RTX 50xx). If you see build failures like nvcc fatal: unsupported gpu architecture or runtime errors like no kernel image available for execution, your GPU needs a different arch.

Override at build time with --build-arg CUDA_ARCH=<value>:

# Single arch — RTX 4060/4070/4080/4090 (Ada Lovelace)
podman build --build-arg CUDA_ARCH=89 -f inference/Dockerfile.v31 -t llama-server:local inference/

# Multiple archs (semicolon-separated) — build a fat binary for Ampere + Ada + Hopper
podman build --build-arg CUDA_ARCH="86;89;90" -f inference/Dockerfile.v31 -t llama-server:local inference/

Common values:

Arch	Architecture	Cards
`60`, `61`	Pascal	GTX 10xx, Tesla P4/P40
`70`	Volta	V100
`75`	Turing	RTX 20xx, T4
`80`, `86`	Ampere	A100, RTX 30xx
`89`	Ada Lovelace	RTX 40xx, L4
`90`	Hopper	H100
`100`, `120`, `121`	Blackwell	B100, RTX 50xx

Your GPU's compute capability: nvidia-smi --query-gpu=compute_cap --format=csv (drop the dot — 8.9 → 89).

AMD GPU Targets (Dockerfile.rocm, V3.1.1)

inference/Dockerfile.rocm compiles llama.cpp's HIP backend for one or more gfx targets. The default is a fat build covering the most common consumer + datacenter AMD GPUs: gfx1100;gfx1101;gfx1102;gfx1030;gfx90a. Each additional target adds ~150 MB to the binary.

Override at build time with --build-arg GFX_TARGET=<value> (or via ATLAS_GFX_TARGET env var, which the compose override forwards):

# Single target — RX 7900 XT/XTX only (smaller image)
ATLAS_GFX_TARGET=gfx1100 docker compose -f docker-compose.yml -f docker-compose.rocm.yml build llama-server

# Two targets for RDNA3 + RDNA2 mixed-fleet
docker build --build-arg GFX_TARGET="gfx1100;gfx1030" -f inference/Dockerfile.rocm -t atlas-llama-rocm:custom inference/

Common values:

Target	Architecture	Cards
`gfx1100`	RDNA3 (Navi 31)	RX 7900 XT, 7900 XTX, 7900 GRE
`gfx1101`	RDNA3 (Navi 32)	RX 7800 XT, 7700 XT
`gfx1102`	RDNA3 (Navi 33)	RX 7600, 7600 XT
`gfx1030`	RDNA2 (Navi 21)	RX 6800, 6800 XT, 6900 XT, 6950 XT
`gfx1031`	RDNA2 (Navi 22)	RX 6700 XT, 6750 XT
`gfx1032`	RDNA2 (Navi 23)	RX 6600, 6600 XT, 6650 XT
`gfx90a`	CDNA2	MI210, MI250, MI250X
`gfx942`	CDNA3	MI300X
`gfx900`	Vega	Vega 56/64 (may need HSA override — see TROUBLESHOOTING.md)
`gfx1200`	RDNA4 (Navi 44)	RX 9070
`gfx1201`	RDNA4 (Navi 48)	RX 9070 XT

RDNA4 (gfx1200/gfx1201) users: set ATLAS_ROCM_TAG=7.2.3-complete — the default ROCm 6.2 base image does not include gfx1200/gfx1201 compiler support. ROCm 7.0+ supports these targets natively; do not set ATLAS_HSA_OVERRIDE_GFX_VERSION. See TROUBLESHOOTING.md § RDNA4 for details.

Your GPU's gfx target: rocminfo | grep -i gfx | head -1 (or look it up in the LLVM AMDGPU processor table).

Geometric Lens Weights (Optional)

ATLAS works without Geometric Lens weights — the service degrades gracefully, returning neutral scores. The V3 pipeline falls back to sandbox-only verification.

To enable C(x)/G(x) scoring, you need trained model weights. Pre-trained weights and training data are available on HuggingFace:

ATLAS Dataset on HuggingFace — includes embeddings, training data, and weight files.

Place weight files in geometric-lens/geometric_lens/models/ (or mount via ATLAS_LENS_MODELS in Docker Compose). The service loads them automatically on startup.

Training scripts are provided in scripts/ if you want to train on your own benchmark data:

scripts/retrain_cx_phase0.py — Initial C(x) training from collected embeddings
scripts/retrain_cx.py — Production C(x) retraining with class weights
scripts/collect_lens_training_data.py — Collect pass/fail embeddings from benchmark runs
scripts/prepare_lens_training.py — Prepare and validate training data format

Bringing your own model (V3.1.1)

If you want to swap in a non-default GGUF, the atlas lens subcommand wraps the probe + train pipeline so you don't have to learn the underlying scripts:

# 1. Drop your GGUF in models/ and update .env to point at it, restart llama-server.

# 2. Probe whether the existing artifacts can score it (cheap, no training):
atlas lens check
# Reports: compat (artifacts work) | needs-build (different dim) | incompatible

# 3. If 'needs-build', train fresh artifacts at the model's native embedding dim:
atlas lens build --samples path/to/labeled.json
# samples format: [{"text": str, "label": 0|1}, ...] where 1 = passing code
# Canonical training set: huggingface.co/datasets/itigges22/ATLAS

# 4. Re-run check — should now report compat:
atlas lens check

Full reference: CLI.md § atlas lens.

ASA Steering Vector (Auto-Built)

May 2026 BiasBusters #4. A residual-stream steering vector that biases the model toward ast_edit over edit_file for whole-function / class / element rewrites, applied before the grammar gate has a chance to reject anything. Strictly optional — ATLAS continues to work without it, just with an unsteered tool-selection bias.

atlas-bootstrap.sh builds it automatically as Step 8.5, after the services come up. The pipeline is:

build_cvector_prompts.py turns the committed geometric-lens/asa_calibration/contrast_pairs.jsonl (1000 pairs) into positive / negative prompt files.
The bootstrap stops llama-server briefly, runs llama-cvector-generator as a one-shot container with --method mean -ngl 99, writes models/ast_edit_steering.gguf, then restarts llama-server.
inference/entrypoint-v3.1-9b.sh sees the file on the next start and appends --control-vector-scaled /models/ast_edit_steering.gguf:0.5 to the llama-server command line.

Total wall time on a 16GB GPU: ~5 minutes. Build runs on the same hardware the model lives on; the resulting vector is model-specific (do not move an ast_edit_steering.gguf built against Qwen3.5-9B-Q6_K to a host running a different base model).

Override behavior (set in .env if you want to tune):

Env var	Default	Effect
`ATLAS_CONTROL_VECTOR`	`/models/ast_edit_steering.gguf`	Override path
`ATLAS_CONTROL_VECTOR_SCALE`	`0.5`	Conservative. Bump to 1.0–1.5 if the bias is too subtle, drop toward 0.2 if non-tool tasks degrade.
`ATLAS_CONTROL_VECTOR_LAYER_RANGE`	(all layers)	Pass two integers, e.g. `"24 30"`, to scope to a layer band. Narrower = safer but weaker.

If the local build fails (e.g. cvector-generator missing in an older atlas-llama image, GPU OOM, network hiccup pulling the runtime), the bootstrap falls back to downloading a prebuilt ast_edit_steering.gguf from the ATLAS HuggingFace dataset. If that also fails the install completes with a warning — atlas doctor will flag the gap as warn, not fail.

To skip the build entirely, set ATLAS_BOOTSTRAP_SKIP_ASA=1 before running the installer.

To rebuild manually (re-curated pairs, different --method, different base model), see geometric-lens/asa_calibration/README.md.

Next Steps

CLI.md — How to use ATLAS once it's running
CONFIGURATION.md — All environment variables and tuning options
TROUBLESHOOTING.md — Common issues and solutions
ARCHITECTURE.md — How the system works internally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ATLAS Setup Guide

Pick your install path

Method 0: One-shot bootstrap (PC-051)

Prerequisites (All Methods)

Verify GPU

Method 1: Docker Compose (Recommended)

Additional Prerequisites

Setup

AMD ROCm — what's different

Vulkan — the universal fallback (PC-114)

arm64 hosts (#115) {#arm64}

What Happens on First Run

Image source: prebuilt vs from-source

Verify Installation

Stopping

Viewing Logs

Updating

Method 2: Bare Metal

Additional Prerequisites

Build

Start Services

Start with the Launcher Script

Method 3: K3s

Additional Prerequisites

Automated Install

Manual Deploy

K3s-Specific Configuration

Verify K3s Deployment

Hardware Sizing

Supported GPUs

CUDA Compute Capability (Dockerfile.v31)

AMD GPU Targets (Dockerfile.rocm, V3.1.1)

Geometric Lens Weights (Optional)

Bringing your own model (V3.1.1)

ASA Steering Vector (Auto-Built)

Next Steps

Uh oh!

FilesExpand file tree

SETUP.md

Latest commit

History

SETUP.md

File metadata and controls

ATLAS Setup Guide

Pick your install path

Method 0: One-shot bootstrap (PC-051)

Prerequisites (All Methods)

Verify GPU

Method 1: Docker Compose (Recommended)

Additional Prerequisites

Setup

AMD ROCm — what's different

Vulkan — the universal fallback (PC-114)

arm64 hosts (#115) {#arm64}

What Happens on First Run

Image source: prebuilt vs from-source

Verify Installation

Stopping

Viewing Logs

Updating

Method 2: Bare Metal

Additional Prerequisites

Build

Start Services

Start with the Launcher Script

Method 3: K3s

Additional Prerequisites

Automated Install

Manual Deploy

K3s-Specific Configuration

Verify K3s Deployment

Hardware Sizing

Supported GPUs

CUDA Compute Capability (Dockerfile.v31)

AMD GPU Targets (Dockerfile.rocm, V3.1.1)

Geometric Lens Weights (Optional)

Bringing your own model (V3.1.1)

ASA Steering Vector (Auto-Built)

Next Steps