draft: v0.4.0 — apps console, snapshots/fork, observability, runtime manifest + presets by tastyeffectco · Pull Request #35 · tastyeffectco/sandboxd

tastyeffectco · 2026-06-24T20:19:58Z

Draft — do not merge until approved. Single release-candidate integrating all accepted v0.3 + v0.4 work.

Branch ancestry is linear: main → console → feat/v0.4-snapshots-observability → feat/runtime-manifest. This branch == the tip of that stack (no conflicts; merges were fast-forwards). Source branches are preserved.

Included

Console / app UI + app config & secrets (write-only sensitive values)
Snapshots / fork / restore (+ snapshot ignore-list, ownership normalization)
Observability / events / activity timeline
Runtime manifest (sandbox.yaml, web + workers process model)
Process API + per-process logs
5 runtime presets (react-vite, nextjs, node-express, fastapi, worker) — all boot, reload after agent tasks, and fork/restore healthy
Shared-host installer + preview-port fixes
OpenAPI updates + contract test

Test results (all green)

gofmt -l clean · go vet · go build ./... · go test ./... (all packages)
OpenAPI contract test
console tsc --noEmit + production build
installer bash -n syntax check (scripts/dev/install-v04-ubuntu.sh, scripts/e2e.sh)
Smoke e2e (disposable host, portless): React preview 200; Next.js preview + _next/static chunks 200 and a build-provoking task no longer poisons (healed to 200); FastAPI add-endpoint live via --reload; snapshot → fork → fork preview 200 + endpoint preserved ($HOME normalized to sandbox:sandbox); process logs endpoint; Activity events recorded; Config & Secrets redaction (sensitive value never returned). Real-LLM agent tasks were simulated (no API key in the test env) but the post-task pipeline ran; console verified via build/typecheck + its /v1 endpoints (not browser-clicked).

Known limitations (non-blocking, documented)

DELETE /v1/sandboxes/{id} purges workspace vs legacy DELETE /sandbox/{id} keeps it
keepalive_until not surfaced in GET /v1/sandboxes/{id}
warming interstitial returns HTTP 200
per-task agent.log can be empty on timeout (persistence WIP)

See CHANGELOG.md (v0.4.0) and docs/sandbox-manifest.md.

🤖 Generated with Claude Code

An optional web console for managing apps on top of sandboxd, plus the versioned /v1 contract it binds to. Built as a folder in the monorepo (API-only boundary) so it splits cleanly to its own repo once /v1 stabilizes. - docs/openapi.yaml — the public /v1 API as OpenAPI 3.0 (apps, sandboxes, tasks/SSE, snapshots). The contract for the console and future integrations. - console/ — Vite + React SPA (shadcn/Vercel-style dark UI). Talks ONLY to /v1 (no Go imports, no DB, no workspace access): app list, app detail with live preview iframe, task submit + live SSE logs, start/stop/snapshot/delete. Playwright specs for the lifecycle. - POST /v1/sandboxes/{id}/start — public counterpart of /stop so an API-only console need not reach the internal wake path. - previewURL now reflects PreviewTLS (http on a default local deploy), so the preview iframe loads without TLS. - Packaging: `docker compose --profile console up` builds the SPA and serves it from nginx, proxying /v1 to sandboxd (deferred DNS so it survives sandboxd restarts). Core mode (no profile) is unchanged. - CI: a console job builds the SPA + image so a TS/build break is caught. MVP scope: single-user, auth-off, public previews. Verified: Go + console SPA + console image all build; nginx config valid; compose console profile resolves; go vet/test green. The browser-driven Playwright run against the full live stack is the remaining check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…main>) Per review: the same Traefik that serves previews routes the console too, selected by Host header (console.<domain> -> console, *.preview.<domain> -> sandboxes) on one entrypoint — instead of a separate published port. Gated on the sandboxd.managed=true label the docker provider requires. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fails if any /v1 route is missing from docs/openapi.yaml, or the spec documents an endpoint that no route serves — the contract the console and external integrations depend on.

Adds control-plane-owned config and secrets per app, so sensitive values never live in Docker env, workspace files, or task logs. - New app_config table (migration 0014), scoped to an app (and so to the API tenant). Sensitive entries store AES-256-GCM ciphertext + a random per-value nonce; non-sensitive entries may keep a plaintext value. - Encryption uses standard-library crypto only. The master key comes from SANDBOXD_SECRETS_KEY (base64, 32 bytes) or an auto-generated 0600 keyfile under the data dir. - API: POST/GET/PATCH/DELETE /v1/apps/{id}/config. Sensitive values are write-only — GET returns metadata only (key, sensitive, access_policy, value_set, timestamps), never the plaintext. Non-sensitive config may be returned in full. - access_policy metadata (control_plane_only | agent_access | runtime_access | both); default control_plane_only. Agent/runtime delivery is the next slice (a scoped-token broker) — for now nothing is injected, so secrets stay in the control plane. - Plaintext is never logged; audit entries record only the key. Secrets are deliberately NOT passed through `docker run -e`. The legacy create-time env stays for backwards compatibility, but app_config is the safer managed replacement. Tests: encryption round-trip / fresh nonce / tamper-detection / 0600 keyfile; sensitive value encrypted at rest and never returned; redaction; default policy; tenant/app scoping; sensitivity toggle re-encrypts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…only Covers two acceptance checks: a PATCH that omits 'value' must not alter the stored secret bytes, and audit entries record the config key but never the plaintext value.

… changelog Adds the four /v1/apps/{id}/config routes (and ConfigItem/CreateConfigRequest/ PatchConfigRequest schemas) to docs/openapi.yaml so the console<->sandboxd contract test stays green after merging #33. Bumps the spec to 0.3.0 and adds an Unreleased changelog entry for the v0.3.0 integration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds a Config & Secrets section to the app detail view backed by the new /v1/apps/{id}/config API: list entries, add a key (secret or plaintext) with an access policy, change an entry's access policy inline, and delete. Secrets are write-only end to end — the API never returns a sensitive value, so a stored secret shows as a redacted '•••• set' chip and can only be replaced. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The secrets broker (Slice 2) is not implemented, so only control_plane_only is enforced. Mark agent_access/runtime_access/both as 'reserved (broker)' and disable them in both selects so the UI never implies a secret is delivered to an agent or app runtime. An existing entry already set to a reserved policy still displays it. Adds a per-row Replace (secret) / Edit (plaintext) action: since sensitive values are write-only, this PATCHes a new value inline — the only way to rotate a stored secret from the console. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds TestAppConfigSecretDoesNotLeak: one lifecycle test asserting a sensitive value (sk-test-secret-ci) never escapes through any of the four leak vectors — API responses, DB plaintext columns, audit rows, or server logs — while a non-sensitive value still round-trips plainly and the default policy stays control_plane_only. Closes the log-output vector the existing config tests didn't capture; runs in the normal Go CI job (internal/api, go test ./...). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The top-level docs predated the console (#32), so a new user had no way to discover it. Add a 'Web console' section to README.md and AGENTS.md: what it does, how to launch it (docker compose --profile console up -d), where to open it (console.localhost via the shared Traefik), and that it's a pure /v1 client. Completes the 'basic docs' item for the console end-to-end experience. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The Snapshot button captured a snapshot with no UI feedback. Now: - success -> info toast 'Snapshot captured. History, restore, and fork are coming in v0.4.0.' - 409 (running source) -> error toast 'Stop the sandbox before capturing a snapshot.' The API client now attaches the HTTP status to thrown errors so the 409 case is detected reliably. No snapshot history/restore/fork — that's v0.4.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The card badge was presence-based (green whenever a sandbox row existed), so a sandbox in creating/stopped/error showed as green 'sandbox' — misleading. Now each app's current-sandbox status is fetched and rendered via StatusBadge (running=green, stopped/error distinct), and 'no sandbox' stays neutral. Honest status, matching the detail view. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

v0.4.0 backend on the public /v1/snapshots subsystem only (internal /sandbox/{id}/... stays unexposed): - migration 0015 adds snapshot.source_app_id so per-app history survives the ephemeral source sandbox; capture stamps it from the source's app_id. - GET /v1/apps/{id}/snapshots — tenant+app-scoped history. - POST /v1/apps/{id}/restore — REPLACE the app's current sandbox from a snapshot (purge current, then clone). Destructive; console confirms. - POST /v1/apps/{id}/fork — new app + its sandbox spun from a snapshot; source app untouched. Tenant scoping enforced on every path (cross-tenant app/snapshot -> 404). Sandbox spin reuses the proven create path (template_path + .git reset). Tests cover store scoping, history, and the restore/fork guard paths; the Docker-dependent spin is verified on a real host, not CI. OpenAPI + contract test updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…se 4) Adds a Snapshots panel to the app detail screen backed by the new /v1/apps/{id} endpoints: - history list (name, captured time, size) via GET /v1/apps/{id}/snapshots - Restore: confirms (replaces the current sandbox, discards un-snapshotted work) then POST /v1/apps/{id}/restore and refreshes - Fork: prompts for a name, POST /v1/apps/{id}/fork into a new app The capture button now refreshes history and (since v0.4.0 ships these) drops the 'coming soon' wording. Actions disabled unless the snapshot is ready. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…T verified Records the release discipline + verification status for the Phase 4 preview branch: capture/history/tenant-scoping/backend-orchestration are tested, but the live restore/fork sandbox spin and preview are deliberately deferred to a real isolated v0.4.0 deploy (they'd otherwise expose port-3000 sandboxes to prod Traefik on the shared host). Not for merge to console/main; no non-draft PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

scripts/dev/install-v04-ubuntu.sh stands up the Phase 4 stack on a fresh Ubuntu 22.04/24.04 server, reusing the repo's docker-compose (traefik + sandboxd + console profile) — no parallel deploy system. It installs Docker if missing, fails if 80/443 are taken, detects the public IPv4, uses sslip.io for preview + console URLs (HTTP on :80), writes .env + a docker-compose.override.yml, gates the public console with Traefik basic auth (demo creds), keeps the API on loopback (disables the edge api.yml router), builds images, starts the stack, and prints URLs + teardown. docs/v0.4.0-test-runbook.md: requirements, install, the 14-step create→preview→ snapshot→restore→fork checklist, TLS-as-follow-up, teardown, release discipline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… 80/443) Default to shared-host mode so the installer is safe next to Coolify/nginx/another Traefik: - HTTP_PORT=18080 (uncommon edge port; set HTTP_PORT=80 for dedicated-host mode) - API_PORT=19090 on loopback only - only the chosen HTTP_PORT must be free; fail clearly telling the user to set HTTP_PORT if taken; do NOT check or require 443 (TLS deferred) - generated URLs include :<HTTP_PORT> unless it's 80 Runbook documents both modes plus 'behind an existing proxy': keep 18080 and have the front proxy forward console.<ip>.sslip.io + *.preview.<ip>.sslip.io to 127.0.0.1:18080 (Host preserved); TLS via the front proxy or a real wildcard domain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

On a shared host with HTTP_PORT=18080, the API returned bare preview URLs (…sslip.io/) that hit whatever owns :80 (Coolify/front proxy) instead of sandboxd's Traefik on :18080. previewURL() now appends the host port unless it's the scheme default (80 for http, 443 for https): Server.PublicHTTPPort <- SANDBOXD_PUBLIC_HTTP_PORT (main.go) docker-compose.yml <- SANDBOXD_PUBLIC_HTTP_PORT: ${HTTP_PORT:-80} The console iframe + open-in-tab link consume sb.preview.url unchanged, and restore/fork responses use the same previewURL(), so all preview surfaces get the corrected port. Unit-tested across http/https x default/custom ports; verified live that GET /v1/sandboxes/{id} returns ...sslip.io:18080. No installer change needed — it already writes HTTP_PORT, which compose forwards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

One append-only app_events table in the existing control-plane SQLite DB (no ClickHouse/OTEL/Loki/separate DB), a centralized best-effort recorder, a tenant-scoped paginated read API, and a console Activity timeline. - migration 0016: app_events(id ULID, owner_token, app/sandbox/task/snapshot ids, type, severity, message, payload_json, created_at) + scoped indexes. ULID id doubles as the newest-first page cursor. - internal/events: Recorder.Record (mirrors audit: own Store interface, detached ctx, never breaks the request); stable type/severity constants. - store: InsertAppEvent + ListAppEvents/ListTaskEvents (owner_token-scoped, cursor-paginated); owner-agnostic GetApp for the background task path. - API: GET /v1/apps/{id}/events and /v1/tasks/{id}/events (newest-first, default 50 / max 200, ?before cursor, next_before). Cross-tenant -> 404/empty. - instrumented via the recorder (no scattered SQL): app.created/updated, config.created/updated/deleted (key only, never the secret), snapshot captured/capture.failed/restored/forked, sandbox create.started/failed/ started/stopped/deleted, task.started, and on the task terminal point task.completed/failed/build.failed + preview.health.ok/failed. - console: read-only Activity panel on app detail (time/severity/type/message + related ids), durable across refresh/restart. - docs/openapi.yaml + contract test; .env.example notes the future SANDBOXD_EVENT_RETENTION_DAYS knob (retention deferred). Tests: recorder writes valid JSON events; tenant scoping; pagination/limit; config event carries the key but never the secret; failed task emits task.failed + task.build.failed on both feeds. gofmt/vet/build/test + contract test green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ndexes Follow-up to Phase 5 (3 fixes): 1. No raw output in app_events.payload_json. Task events now carry structured flags/reasons only — never BuildErrorMessage/PreviewErrorMessage/ErrorMessage text (which can echo secrets the app printed; the full text stays in the task's result.json). New payloads: task.completed -> {files_changed, duration_ms, build_ok} task.failed -> {failure_reason, has_error} task.build.failed -> {reason:'build_failed', has_build_error:true} preview.health.failed-> {preview_status, has_preview_error} Test now plants a fake secret in all three error fields and asserts it never appears in any event's payload_json or message. 2. Monotonic ULID event ids (ulid.Monotonic under a mutex), so a same-millisecond completion burst sorts in emission order by id (the page cursor). Added a monotonic-ordering test. 3. Dropped the unused indexes (owner-only, sandbox-only, type-only) from migration 0016 — no endpoint queries them and each is write amplification on an append-only table. Kept idx_app_events_app + idx_app_events_task. gofmt/vet/build/test + OpenAPI contract test green. Console untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…se 7 core) runtimed is no longer hardcoded to a single Vite dev server. An optional workspace sandbox.yaml declares how the app builds/runs/previews/reports health, plus background workers. No manifest = the built-in Vite defaults, so existing apps are unchanged. - cmd/runtimed/manifest.go: parse sandbox.yaml (gopkg.in/yaml.v3) with full backward-compatible defaults sourced from the existing RUNTIMED_* env vars; resolution rules for web / worker-only / empty / invalid. Unit-tested. - cmd/runtimed/process.go: generalized the single dev-server supervisor into a reusable 'process' (web OR worker) with the same backoff/fast-fail/stop logic. - main.go: builds the web process (if declared) + workers from the manifest, supervises all, probes the web health_path (Vite asset deep-probe kept only for the default app), and reports per-process status. - protocol.go: Status.Processes []ProcessState (name/kind/running/pid/restarts); Preview still carries the web health for compatibility. - build check uses the manifest's build.command/timeout (empty => skipped). - docs/sandbox-manifest.md: schema reference + examples + security note (the manifest grants no new privilege; no Docker socket/compose/k8s). New dependency: gopkg.in/yaml.v3 (pure-Go, CGO-safe) — the first YAML dep, needed to parse the manifest. Deferred to follow-up slices: control-plane logs API + console process status/logs panel + live image-rebuild e2e. gofmt/vet/build/test green; runtimed manifest tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

1. BuildSpec is never nil after LoadManifest (applyDefaults always sets it); added a defensive a.build!=nil guard at the build-check call site so a hand-built app{} can't panic. Test: absent/worker-only/invalid all yield a non-nil Build. 2. Worker-name validation: only [A-Za-z0-9_-] (1-64 chars) — rejects empty, path separators, '..', and duplicates (the name becomes ~/.runtimed/<name>.log, so this is path-safety). Dropped the silent auto-naming. Invalid manifest is rejected and falls back to the safe default (web app, no workers). 3. Port validation: an explicit web.port must be 1-65535 (0 = unset -> default). 4. Worker-only preview semantics: added PreviewNone ('none') instead of the misleading 'down', and documented it (non-breaking — new enum value). 5. Tests: build-never-nil, invalid/duplicate worker names rejected safely, invalid ports, worker-only status -> PreviewNone + no preview probe (no panic). 6. docs/sandbox-manifest.md: documents worker-only=none, the validation rules, and that Phase 7 is NOT fully verified until the rebuilt base image runs the default Vite, a custom web, and a worker-only manifest. gofmt/vet/build/test green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rebuilt sandboxd-base with the new runtimed (yaml dep fetched fine in the image builder) and ran all three manifest shapes end-to-end on a disposable host stack: - default Vite (no manifest): preview ready, web running, pnpm build exit 0 - custom web (python http.server :5000, health_path /healthz, build skipped): preview ready, serves /healthz and / with 200 - worker-only: preview 'none' (valid, not error), worker running + producing output Test sandboxes were portless (no Traefik label) so prod routing was untouched. Marks the manifest verification status as live-verified. No runtime fixes needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Surfaces the runtime manifest process model through the public /v1 API and the console (7A marked accepted/verified in docs). - GET /v1/sandboxes/{id} now includes processes[] (name/kind/running/pid/ restarts), mapped via a pure, testable v1RuntimeView helper. A worker-only app (preview status 'none') returns an empty endpoint URL — no fake preview. - GET /v1/sandboxes/{id}/processes/{name}/logs: read-only tail of ~/.runtimed/<name>.log (which the files API refuses as a reserved subtree). Sandbox-scoped by id like the rest of the v1 sandbox API; process name strictly validated ([A-Za-z0-9_-], 1-64) so no path escapes; tail capped (default 200, max 1000, reads <=256KiB); no write/delete. Unknown process/sandbox -> 404, bad name -> 400. - console: app detail shows a Processes panel (name/kind/status/pid/restarts + per-process recent logs); preview pane relabeled 'Preview / endpoint'; worker-only renders 'No public endpoint — worker process running' (valid, not a failure). - OpenAPI: documents the logs route + Sandbox.processes/Process schema. - tests: process-logs tail, bad-name->400 (incl. traversal), unknown->404; v1RuntimeView worker-only shape (status none, no URL) + process mapping. gofmt/vet/build/test + contract test green; console tsc + build green. No presets, no manifest editor, no compose/kata/containerd (7B non-goals). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…+ caps) Users can create a working app of a common type from a preset that actually boots. Approach (A): runtimed applies the preset on first boot. - internal/preset: shared registry (single source of truth) of 5 presets — react-vite, nextjs, node-express, fastapi, worker — each with id/label/ description/optional template/generated sandbox.yaml/required capabilities. - runtime_preset on POST /v1/apps (stored on the app, migration 0017), POST /v1/apps/{id}/sandbox (explicit else app default; precedence over template), and POST /v1/sandboxes. Unknown preset -> 400. GET /v1/presets lists them (console picker source of truth). - runtimed applies the preset on FIRST boot only: seed the preset template into an empty workspace, write sandbox.yaml only when missing — never overwrites existing app files or sandbox.yaml. Falls back to default template if unknown. - minimal starter templates added (node-express-standard, fastapi-standard, nextjs-standard) — reliable boot, not fancy. React uses existing react-standard. - console: app-type preset dropdown on New App (data-driven from /v1/presets); sandbox create inherits the app's preset. - tests: every preset manifest validates with the loader; unknown->400 (app + sandbox create); app stores runtime_preset; resolve precedence; presets list; applyPreset/writeManifestIfMissing (write-once, never overwrite); seed skips non-empty workspace. gofmt/vet/build/test + contract test + console tsc green. Image capability check: base image already has python3-venv (pip via venv works, confirmed) + node/pnpm — so all 5 presets boot WITHOUT a Dockerfile change. Live boot of all 5 on a rebuilt image is the deferred e2e step. No Postgres/Redis/managed services, no compose, no manifest editor, no advanced override, no provider work (7C-1 non-goals). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Records the live e2e on rebuilt image sandboxd-base:p7c1 — all five presets boot: react-vite ~31s, nextjs ~39s (warm; cold may be slower), node-express ~30s, fastapi ~37s (runtime venv/pip install works), worker ~28s (preview none + worker running). Confirms: presets seed expected files, runtimed writes sandbox.yaml, process status/logs endpoint works, API rejects unknown presets with 400, runtimed logs loudly + falls back to react-standard on a bad preset env. Notes still-unit-only items (console dropdown not browser-clicked; app-default preset resolution not live-tested) and future image optimizations (warm pnpm/npm cache, preinstalled FastAPI/uvicorn or uv, Next.js layer). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tance The Next.js preset ran 'pnpm dev' (web) + 'pnpm build' (post-task build check) in the same workspace; next build writes production .next/ that the long-running next dev then serves from -> 500s on _next/static. Dev isn't restarted after the build, so it stays broken. Fix (smallest reliable): - nextjs preset build.command is now empty: the build check is the only thing that runs 'next build', so skipping it removes the poison source. Tradeoff: no post-task build verification for Next.js until an isolated build check exists. - web command 'rm -rf .next' before 'pnpm dev' defends a clean start against a stale/production .next carried in by snapshot restore (alone insufficient: dev isn't restarted post-build, hence skipping the check too). - nextjs template ships .gitignore (node_modules,.next,out,.env,.env.local) so the git-based workspace checkpoint doesn't treat them as app changes. Re-tested live (image sandboxd-base:p7c1b, portless): fresh ready ~30s, /+asset 200; reproduced bug (pnpm build -> 500); recovery via restart ~10s -> 200; edit hot-reloads; checkpoint tracks only 6 real files. Real LLM agent task NOT run (no ANTHROPIC_API_KEY) — verified the post-task build-check mechanism directly. Tests: nextjs manifest has no 'pnpm build' + empty build + 'rm -rf .next'; template ships .gitignore; all preset manifests still validate. Docs: 7C-1 NOT marked fully accepted; records the fix, the not-run agent path, and follow-ups (snapshot .next bloat unfiltered; split task build_ok/preview_ok/ app_healthy; empty agent.log on timeout). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ist as deferred Issue 2 (build/health semantics) — implemented: - TaskResult adds build_status (passed|failed|skipped), preview_ok (*bool; omitted for worker-only), app_healthy (build-not-failed AND web preview serving / worker-only a worker running). build_ok kept for back-compat but now true ONLY when build_status=passed — a skipped build is never build_ok=true. - runtimed sets build_status from the build check (empty command => skipped, not a fake pass) and derives app_healthy/preview_ok via postTaskHealth(). - console shows 'build skipped|passed|failed' (+ 'unhealthy') instead of the old unconditional 'build ok'. OpenAPI TaskResult updated. - tests: postTaskHealth web (passed/skipped/down/failed) + worker-only (running/ stopped); preview_ok omitempty in JSON. Issue 1 (snapshot ignore-list) — deferred (not small), documented: - capture is zstd of the raw loopback .img (block image), not a tree copy, so an ignore-list can't slot into capture without reflink-copy+loop-mount+prune or a filtered-tar redesign (mount risk for an RC). Correctness (stale .next in forks/restores) is already handled by 'rm -rf .next' on dev start; only size bloat remains. Recorded as a scoped follow-up. gofmt/vet/build/test + contract test green; console tsc + build green. No merge to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…copy Storage model (clarified): the OSS build stores each workspace as a plain bind-mounted DIRECTORY (loopback .img is legacy; internal/snapshot's zstd-.img path is dead in dir mode). The public /v1 snapshot/fork/restore subsystem captures by copying that directory (captureImage), so the ignore-list belongs in that copy — no image/zstd step involved. captureImage now uses copyTreeExcluding instead of 'cp -a': - skips node_modules/.next/out/.venv/__pycache__/.cache by base name at any depth (conservative; dist/build NOT ignored — templates don't treat them as generated); - copies symlinks verbatim and never follows them (no path-traversal/symlink escape during staging copy); - preserves mode + ownership (lchown) so the restore cp -a keeps files writable. Restore/fork unchanged (still cp -a the snapshot dir); restored workspaces re-create deps on first boot. Tests: exclusion incl. nested node_modules, source + sandbox.yaml preserved, symlink-escape safety; existing snapshot/fork/restore guard tests still pass. gofmt/vet/build/test + contract green. No merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…se-blocker) Bug: applyDefaults replaced an explicit empty build.command with the default pnpm build, so presets could NOT disable the post-task build check. Next.js therefore still ran 'next build' after tasks and poisoned the live 'next dev' (.next/ -> 500 on _next/static); FastAPI/worker could get false/irrelevant checks. Fix: BuildSpec.Command is now *string so unset (nil) is distinct from explicit empty (&""): - no build block / build: {} -> default pnpm build (backward compatible) - build.command: "" -> SKIP the build check - build.command: "x" -> run x task.go derefs the command; empty => skipped (build_status=skipped, build_ok false). worker preset now sets build.command "" explicitly (was falling back to the default). Build checks are runtime verification (does the app still build/start after a task), NOT production deployment builds — documented, with how to skip. Acceptance tests (TestManifestBuildResolution): absent->default, {}->default, ""->skip, "x"->run; manifest assertions updated for *string. Live retest (image sandboxd-base:p7c1d, portless): ran a coding task on a Next.js sandbox — agent fails w/o creds but the post-task build check runs; result build_status=skipped, NO pnpm/next build executed, / + four _next/static/chunks/* return 200 (not poisoned). FastAPI: build_status=skipped, no pnpm build, /health 200. Cleaned up; prod untouched. gofmt/vet/build/test + contract green. No merge to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gent builds New finding: even with the platform build check skipped, the coding AGENT can run 'next build' during a task, writing production .next/ that the live 'next dev' then serves -> 404/500 on _next/static. rm -rf .next at boot can't undo a mid-session poison because dev isn't restarted after the task. Fix: add web.restart_after_task (manifest WebProc field, default false). When true, runtimed restarts the web process after every task (skipped on cancel) via the existing supervisor (stop() -> re-run start command), then waits up to 90s for a 200 on the health path before reporting preview/health. The web command re-runs, so its 'rm -rf .next' wipes the agent's production build and dev comes back clean. Next.js preset sets restart_after_task: true; other stacks leave it false. Live retest (image sandboxd-base:p7c1e, portless): created Next.js sandbox, simulated the agent by running 'pnpm build' mid-session (.next became a production build w/ BUILD_ID; /_next/static/chunks/* -> 404), then ran a coding task. After the task: web restarts=1, / + four chunk assets -> 200, .next is dev again (no BUILD_ID). Recovery confirmed. Cleaned up; prod untouched. Tests: manifest parses web.restart_after_task (default false); nextjs preset sets it true. gofmt/vet/build/test green. Docs: 'Dev-mode resilience' section. No merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ProvisionFromTemplate (the single funnel for app fork, app restore, and direct from_snapshot creation) cloned the snapshot/template with 'cp -a' (preserves SOURCE ownership) and created the workspace $HOME dir as root — so the sandbox user (uid 1000) hit EACCES writing ~/.cache, pnpm/npm store, node_modules, .next, .venv, generated files. Fresh seed already chowns; the template path did not. Fix: after the clone, ProvisionFromTemplate recursively normalizes ownership of the whole workspace (incl. the $HOME dir itself) to the sandbox uid/gid (1000) — same result as the fresh seed's chown -R, never trusting the snapshot's captured ownership. Uses Lchown via a WalkDir lstat walk that never descends a symlink, so a symlink pointing outside the workspace can't redirect the chown. Logged via the loopback Manager logger. Host-side numeric chown is correct for the OSS --userns=host default. Tests (internal/loopback): chowns the whole tree incl. root dir; symlink-escape not followed; ProvisionFromTemplate normalizes (covers fork/restore/from_snapshot — all route through it). Live retest (image p7c1f, all portless -> no Traefik label -> no prod collision): - Next.js snapshot -> fork: $HOME 1000:1000, ready 31s, / + chunks 200, node_modules reinstalled, ~/.cache writable (no EACCES). - FastAPI fork: $HOME 1000:1000, ready 15s, /health 200, venv reinstalled. Cleaned up; prod untouched. gofmt/vet/build/test green. No merge to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two FastAPI bugs: 1. Port mismatch — the public preview routes to 3000 but uvicorn ran on 8000, so external preview was 502 while the internal probe said ready. 2. Stale after task — uvicorn didn't reload, so an agent-added route 404'd until a manual restart. Fix: FastAPI preset manifest now runs '.venv/bin/uvicorn main:app --host 0.0.0.0 --port 3000 --reload' with web.port: 3000 (health_path /health, build still skipped). Template requirements.txt adds watchfiles so --reload works. No restart_after_task — reload is reliable (kept for Next.js only). Tests: fastapi preset validates, web.port 3000, command has --reload + --port 3000, no 8000, no pnpm build; existing preset tests pass. Live retest (image p7c1g, portless; external check via container-IP:3000 — the path Traefik connects on — to stay off the host Traefik): create -> public /health 200 (was 502); agent adds /hello -> /hello 200 with NO manual restart (watchfiles reload, 2 reload lines in web.log); stop -> snapshot -> fork -> fork $HOME 1000:1000, public /health 200, /hello preserved, venv reinstalled. Cleaned up; prod untouched. gofmt/vet/build/test green. No merge to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two preset reload gaps (neither runtime has live reload): - node-express: 'node server.js' didn't reload -> agent-added routes 404 until manual restart. Set web.restart_after_task: true. - worker: a long-running worker kept old behavior after a task. Extended the restart_after_task mechanism from web to WORKER processes (per-process flag on the process struct; workers are bounced without a readiness wait, web keeps its health-wait). worker preset now ships an editable worker.sh (template worker-standard, command 'bash worker.sh') with the worker's restart_after_task: true, so edits to worker.sh take effect on restart. Not a generic process-policy framework — just the per-process flag. Manifest: Worker.RestartAfterTask added; process.restartAfterTask field; main wires web + each worker from the manifest; runTask restarts flagged web (with health wait) and flagged workers (stop -> supervisor re-runs). Tests: manifest parses worker.restart_after_task (default false); node-express + worker presets set restart_after_task; react-vite/fastapi do NOT (live reload), nextjs unchanged; all preset manifests validate. full go test + console build/typecheck green. Live retest (image p7c1h, portless): node-express agent adds /ping -> pong 200 after task, no manual restart (restarts 1); worker output heartbeat->PONG-BEAT in worker.log after task, no manual restart (restarts 1). React/FastAPI/Next unchanged. Cleaned up; prod untouched. Docs: reload-mechanism table + documented (not implemented) follow-ups: v1 vs legacy DELETE workspace semantics, keepalive_until missing from v1 GET, warming interstitial returns 200. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Marks 7C-1 as accepted & live-verified. All five presets pass preview + agent- task reload + fork/restore: - react-vite: task applies, preview 200, fork healthy - nextjs: build-provoking task no longer poisons preview (restart_after_task), chunks 200, fork healthy - fastapi: port 3000 + uvicorn --reload, task changes live, fork healthy - node-express: restart_after_task, added route live after task, fork healthy - worker: restart_after_task, code/output change reflected in logs, fork healthy Cross-cutting verified: snapshot ignore-list, fork/restore ownership normalized to sandbox:sandbox, wake/idle/keepalive edge cases. Remaining items recorded as non-blocking follow-ups: v1 DELETE purges vs legacy DELETE keeps workspace; keepalive_until missing from v1 GET; warming interstitial returns 200; per-task agent.log persistence on timeout. Phase-status banner + section header updated. Not merged to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Integration of all accepted v0.3 + v0.4 work (console/app UI, config & secrets, snapshots/fork/restore, observability/events, runtime manifest, process API/logs, five runtime presets, shared-host installer/preview port, OpenAPI updates). - CHANGELOG: v0.4.0 entry (Added/Fixed/Known limitations). - README: runtime presets + sandbox.yaml section; Known limitations (v0.4.0). - ARCHITECTURE: runtimed manifest/process model + presets note. - console/README: app detail (preview/endpoint, config, snapshots, activity, processes) + New-App preset picker. Known limitations recorded (non-blocking): v1 DELETE purge vs legacy keep; keepalive_until not surfaced in v1 GET; warming interstitial returns 200; per-task agent.log persistence on timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sandboxd (the contract — primary): - TestPublicAPISurfacePresent pins the required public /v1 surface in BOTH api.go routes and openapi.yaml (parity alone can't catch both-sides deletion): presets, apps, app sandbox, sandbox GET, process logs, snapshots + app snapshot/fork/restore, app config, app events. - v1_client_contract_test: pins the JSON SHAPES the console/SDK consume (GET /v1/sandboxes/{id}: id/status/preview{url,status}/processes/template; GET /v1/presets: id/label/description). If the server shape changes, this fails first and the client fixtures must follow. (Existing suites already cover config redaction, events, process logs, snapshots/fork guards, runtime-view, build/health semantics.) console (a client — lightweight smoke, fixtures mirror sandbox d responses): - vitest + jsdom + @testing-library/react (lockfile regenerated for frozen install). - src/test/fixtures.ts: apps, 5 presets, web + worker sandbox runtime/processes, events, redacted config — shaped to match the Go contract tests. - App.test: console loads, app list renders, preset dropdown populated from /v1/presets. AppDetail.test: preview/processes/activity/config sections render; config redaction (no plaintext secret); worker/no-preview renders as valid; Delete control wording surfaced. docs/release-checklist.md: manual VPS smoke (install, React/Next/FastAPI/express/ worker tasks, snapshot/fork, process logs, events, config redaction, idle/wake/ keepalive) + known limitations. Gate: go test + OpenAPI contract + console tsc/build + console vitest all green. No new product features; not merged to main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…reset test, unhealthy fixture) 1. CI: the console job now runs pnpm install --frozen-lockfile, tsc --noEmit, pnpm test (vitest), and pnpm build directly (clear output), keeping the docker image build as an extra check. Adds release/** to push triggers (PRs already run via pull_request). 2. Console delete wording: v1 DELETE purges the workspace, so the button now reads 'Delete sandbox and workspace' and requires a window.confirm that spells out the workspace (code/deps/generated files) is permanently removed. Test updated to assert the wording. 3. Preset shape test: assert the REQUIRED preset ids exist with id/label/ description instead of an exact count, so adding a preset later doesn't break it. 4. Unhealthy sandbox fixture + test: stopped sandbox, preview down, web process not running (restarts=3). Asserts the UI shows no live preview iframe, renders 'Sandbox not running', and the process row reads 'stopped' (never 'running'). Gate: gofmt/vet/build/go test + OpenAPI contract + console tsc + vitest (7/7) + console build, all green. No new features; not merged to main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sandboxd (the contract): - GET /v1/settings: read-only, tokenless-safe instance summary — version/build, networking (preview domain/base/port/tls/entrypoint), auth.enabled, runtime (storage mode, base image), lifecycle (idle reap + threshold, keepalive max), egress mode, agent providers, runtime presets, and capability flags (snapshots/config_secrets/templates/forward_auth). Server.Instance holds the static safe metadata; populated in main (version, auth !Disabled, storage 'directory', egress mode, providers ['opencode'], idle settings). NEVER emits secrets/tokens/keys/env values. - OpenAPI documents /v1/settings; added to required-public-surface test. - Tests: TestV1SettingsShapeAndNoSecretLeak builds a real cipher from a known key and asserts the response shape is stable AND the key never appears; capability flags + preview_base + presets verified. Contract parity green. console (a client): - api.ts Settings type + getSettings(); Settings.tsx read-only page with sections System / Networking / Runtime & presets / Agents / Security-auth / Egress / Capabilities; topbar Settings nav. - fixtures.settingsFixture mirrors the API shape; Settings.test.tsx renders all sections and asserts auth shows a mode (not a token) and no secret/password field appears. Non-goals respected: read-only (no config editing), no RBAC, no tenant mgmt, no external secret managers, no daemon restart. Gate green: gofmt/vet/build/go test + OpenAPI contract + console tsc/vitest(9/9)/build. Not merged to main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sted, hot-applied) Only the lifecycle tunables (idle_reap_enabled, idle_threshold_seconds, keepalive_max_seconds) are editable from the console; everything else stays read-only / env-managed (auth, secrets, egress, networking are never editable). sandboxd (contract): - PATCH /v1/settings: STRICT allowlist via json DisallowUnknownFields — any other key (auth/egress/networking/secrets/version/unknown) -> 400, no mutation. Range-validated (idle 60..86400s, keepalive 0..7d). Persists to a singleton instance_settings row (migration 0018) and HOT-APPLIES via a shared instancecfg.Live the idle reaper + keepalive path read each use; audited (settings.update). 503 when Live/Store unwired. - GET /v1/settings now reads lifecycle live + advertises an "editable" list. - reaper.Idle gains ThresholdFn/EnabledFn (live overrides; nil = static). main builds instancecfg.Live from env defaults, overlays the persisted row at boot, wires it to the reaper + Server. keepalive handler reads s.keepaliveMax() live. - Tests: reject protected/unknown keys (no mutation/persist), validation bounds, persist+hot-apply round-trip, 503 without Live, no-secret-leak (existing). OpenAPI + contract + required-surface updated. console (client): - Settings page: editable Lifecycle section (gated on the server "editable" list) with Save -> PATCH; all other sections read-only (no inputs). api.patchSettings. - Tests: lifecycle inputs present + read-only sections have no inputs; Save sends a PATCH carrying only the lifecycle object. vitest 11/11. Non-goals respected: no RBAC, no tenant mgmt, no external secret managers, no daemon restart, no editing of protected/restart-required settings. Gate green: gofmt/vet/build/go test + OpenAPI contract + console tsc/vitest/build. Not merged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Defines the base image / runtime preset / starter-import layering and the contract a custom base image must meet to work with sandboxd: - runtimed as the container main process - sandbox user uid/gid 1000:1000 - workspace /home/sandbox, app /home/sandbox/workspace/app - optional /opt/sandbox-skel seed + /opt/templates/<preset> - unprivileged: no privileged, no Docker socket, no extra caps, no writable-rootfs assumption - must provide the toolchains the presets in use require - selected instance-wide via SANDBOXD_IMAGE (read-only in Settings) Includes footguns (missing runtimed / wrong uid / wrong workspace / missing toolchain / requires Docker-privileged) and a roadmap (startup preflight, image profiles, browser image, app-level selection — later; no Dockerfile builder / Compose / registry creds). No code, no Settings UI change, no image profiles, no app-level selection. On release/v0.4-apps-console (PR #35 stays draft); release docs untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…notes - README: reposition as "Self-hosted control plane for AI-built apps"; add an above-the-fold "What you get" list (web console, runtime presets, live preview URLs, agent tasks, app config/secrets, snapshots/fork/restore, activity/events, process logs, settings/lifecycle); extend Known limitations (Docker-only; no Compose/local DB/Git import yet). - CHANGELOG: cut [0.4.0] — 2026-06-25 (was Unreleased); add Settings (GET/PATCH lifecycle), base-image contract, and test-foundation entries; honest known limitations (v1 delete purge, keepalive_until not surfaced, warming page 200, Docker-only, no Compose/local DB/Git import); note it rolls up the v0.3.0 work. - docs/release-notes-v0.4.0.md: draft GitHub release notes. - installer: default base tag sandboxd-base:0.4.0-test -> :0.4.0 (drop confusing "test" language at launch; SANDBOXD_IMAGE override unchanged). No feature/runtime changes. Gate green: gofmt/vet/build/go test + OpenAPI contract + installer syntax + console tsc/test(11)/build. Validated by the from-zero VPS RC QA pass. PR #35 to be marked ready (not merged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tastyeffectco · 2026-06-25T19:52:41Z

RC QA: PASS — a from-zero VPS install + the docs/release-checklist.md smoke (console, presets, previews, agent tasks, snapshots/fork, process logs, events, config/secrets redaction, settings/lifecycle, idle/wake/keepalive) passed on a fresh Ubuntu host. Full automated gate green (Go test + OpenAPI contract + console tsc/test/build). Release-docs pass done (README positioning, CHANGELOG 0.4.0, release notes; installer base tag de--test-ed). Marked ready for review — do not merge until approved.

tastyeffectco and others added 30 commits June 22, 2026 21:49

test(api): guard that /v1 routes and the OpenAPI spec stay in sync

930b8e9

Fails if any /v1 route is missing from docs/openapi.yaml, or the spec documents an endpoint that no route serves — the contract the console and external integrations depend on.

chore(console): gitignore the local pnpm store

7eeb41a

ci: run on the console integration branch too

edb5f6c

test(config): PATCH preserves a secret without value; audit logs key …

d021ed4

…only Covers two acceptance checks: a PATCH that omits 'value' must not alter the stored secret bytes, and audit entries record the config key but never the plaintext value.

merge: app-scoped config & secrets (#33) into console (v0.3.0)

7c2baf1

merge: web console + /v1 OpenAPI spec (#32) into console (v0.3.0)

ba240c0

merge: v0.4.0 snapshots/fork/restore (feat/snapshots-fork)

1c5a9d7

merge: v0.4.0 observability event timeline (feat/observability-events)

fae9044

tastyeffectco and others added 16 commits June 23, 2026 19:45

tastyeffectco force-pushed the release/v0.4-apps-console branch from 22d2256 to 37f06c2 Compare June 24, 2026 21:47

tastyeffectco and others added 2 commits June 25, 2026 10:35

tastyeffectco marked this pull request as ready for review June 25, 2026 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: v0.4.0 — apps console, snapshots/fork, observability, runtime manifest + presets#35

draft: v0.4.0 — apps console, snapshots/fork, observability, runtime manifest + presets#35
tastyeffectco wants to merge 48 commits into
mainfrom
release/v0.4-apps-console

tastyeffectco commented Jun 24, 2026

Uh oh!

tastyeffectco commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tastyeffectco commented Jun 24, 2026

Included

Test results (all green)

Known limitations (non-blocking, documented)

Uh oh!

tastyeffectco commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant