Skip to content

draft: v0.4.0 — apps console, snapshots/fork, observability, runtime manifest + presets#35

Open
tastyeffectco wants to merge 48 commits into
mainfrom
release/v0.4-apps-console
Open

draft: v0.4.0 — apps console, snapshots/fork, observability, runtime manifest + presets#35
tastyeffectco wants to merge 48 commits into
mainfrom
release/v0.4-apps-console

Conversation

@tastyeffectco

Copy link
Copy Markdown
Owner

Draft — do not merge until approved. Single release-candidate integrating all accepted v0.3 + v0.4 work.

Branch ancestry is linear: main → console → feat/v0.4-snapshots-observability → feat/runtime-manifest. This branch == the tip of that stack (no conflicts; merges were fast-forwards). Source branches are preserved.

Included

  • Console / app UI + app config & secrets (write-only sensitive values)
  • Snapshots / fork / restore (+ snapshot ignore-list, ownership normalization)
  • Observability / events / activity timeline
  • Runtime manifest (sandbox.yaml, web + workers process model)
  • Process API + per-process logs
  • 5 runtime presets (react-vite, nextjs, node-express, fastapi, worker) — all boot, reload after agent tasks, and fork/restore healthy
  • Shared-host installer + preview-port fixes
  • OpenAPI updates + contract test

Test results (all green)

  • gofmt -l clean · go vet · go build ./... · go test ./... (all packages)
  • OpenAPI contract test
  • console tsc --noEmit + production build
  • installer bash -n syntax check (scripts/dev/install-v04-ubuntu.sh, scripts/e2e.sh)
  • Smoke e2e (disposable host, portless): React preview 200; Next.js preview + _next/static chunks 200 and a build-provoking task no longer poisons (healed to 200); FastAPI add-endpoint live via --reload; snapshot → fork → fork preview 200 + endpoint preserved ($HOME normalized to sandbox:sandbox); process logs endpoint; Activity events recorded; Config & Secrets redaction (sensitive value never returned). Real-LLM agent tasks were simulated (no API key in the test env) but the post-task pipeline ran; console verified via build/typecheck + its /v1 endpoints (not browser-clicked).

Known limitations (non-blocking, documented)

  • DELETE /v1/sandboxes/{id} purges workspace vs legacy DELETE /sandbox/{id} keeps it
  • keepalive_until not surfaced in GET /v1/sandboxes/{id}
  • warming interstitial returns HTTP 200
  • per-task agent.log can be empty on timeout (persistence WIP)

See CHANGELOG.md (v0.4.0) and docs/sandbox-manifest.md.

🤖 Generated with Claude Code

tastyeffectco and others added 30 commits June 22, 2026 21:49
An optional web console for managing apps on top of sandboxd, plus the
versioned /v1 contract it binds to. Built as a folder in the monorepo
(API-only boundary) so it splits cleanly to its own repo once /v1
stabilizes.

- docs/openapi.yaml — the public /v1 API as OpenAPI 3.0 (apps,
  sandboxes, tasks/SSE, snapshots). The contract for the console and
  future integrations.
- console/ — Vite + React SPA (shadcn/Vercel-style dark UI). Talks ONLY
  to /v1 (no Go imports, no DB, no workspace access): app list, app
  detail with live preview iframe, task submit + live SSE logs,
  start/stop/snapshot/delete. Playwright specs for the lifecycle.
- POST /v1/sandboxes/{id}/start — public counterpart of /stop so an
  API-only console need not reach the internal wake path.
- previewURL now reflects PreviewTLS (http on a default local deploy),
  so the preview iframe loads without TLS.
- Packaging: `docker compose --profile console up` builds the SPA and
  serves it from nginx, proxying /v1 to sandboxd (deferred DNS so it
  survives sandboxd restarts). Core mode (no profile) is unchanged.
- CI: a console job builds the SPA + image so a TS/build break is caught.

MVP scope: single-user, auth-off, public previews. Verified: Go +
console SPA + console image all build; nginx config valid; compose
console profile resolves; go vet/test green. The browser-driven
Playwright run against the full live stack is the remaining check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…main>)

Per review: the same Traefik that serves previews routes the console too,
selected by Host header (console.<domain> -> console, *.preview.<domain>
-> sandboxes) on one entrypoint — instead of a separate published port.
Gated on the sandboxd.managed=true label the docker provider requires.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fails if any /v1 route is missing from docs/openapi.yaml, or the spec
documents an endpoint that no route serves — the contract the console
and external integrations depend on.
Adds control-plane-owned config and secrets per app, so sensitive values
never live in Docker env, workspace files, or task logs.

- New app_config table (migration 0014), scoped to an app (and so to the
  API tenant). Sensitive entries store AES-256-GCM ciphertext + a random
  per-value nonce; non-sensitive entries may keep a plaintext value.
- Encryption uses standard-library crypto only. The master key comes
  from SANDBOXD_SECRETS_KEY (base64, 32 bytes) or an auto-generated 0600
  keyfile under the data dir.
- API: POST/GET/PATCH/DELETE /v1/apps/{id}/config. Sensitive values are
  write-only — GET returns metadata only (key, sensitive, access_policy,
  value_set, timestamps), never the plaintext. Non-sensitive config may
  be returned in full.
- access_policy metadata (control_plane_only | agent_access |
  runtime_access | both); default control_plane_only. Agent/runtime
  delivery is the next slice (a scoped-token broker) — for now nothing is
  injected, so secrets stay in the control plane.
- Plaintext is never logged; audit entries record only the key.

Secrets are deliberately NOT passed through `docker run -e`. The legacy
create-time env stays for backwards compatibility, but app_config is the
safer managed replacement.

Tests: encryption round-trip / fresh nonce / tamper-detection / 0600
keyfile; sensitive value encrypted at rest and never returned; redaction;
default policy; tenant/app scoping; sensitivity toggle re-encrypts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…only

Covers two acceptance checks: a PATCH that omits 'value' must not alter
the stored secret bytes, and audit entries record the config key but
never the plaintext value.
… changelog

Adds the four /v1/apps/{id}/config routes (and ConfigItem/CreateConfigRequest/
PatchConfigRequest schemas) to docs/openapi.yaml so the console<->sandboxd
contract test stays green after merging #33. Bumps the spec to 0.3.0 and adds
an Unreleased changelog entry for the v0.3.0 integration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a Config & Secrets section to the app detail view backed by the new
/v1/apps/{id}/config API: list entries, add a key (secret or plaintext) with an
access policy, change an entry's access policy inline, and delete. Secrets are
write-only end to end — the API never returns a sensitive value, so a stored
secret shows as a redacted '•••• set' chip and can only be replaced.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The secrets broker (Slice 2) is not implemented, so only control_plane_only is
enforced. Mark agent_access/runtime_access/both as 'reserved (broker)' and
disable them in both selects so the UI never implies a secret is delivered to
an agent or app runtime. An existing entry already set to a reserved policy
still displays it.

Adds a per-row Replace (secret) / Edit (plaintext) action: since sensitive
values are write-only, this PATCHes a new value inline — the only way to rotate
a stored secret from the console.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds TestAppConfigSecretDoesNotLeak: one lifecycle test asserting a sensitive
value (sk-test-secret-ci) never escapes through any of the four leak vectors —
API responses, DB plaintext columns, audit rows, or server logs — while a
non-sensitive value still round-trips plainly and the default policy stays
control_plane_only. Closes the log-output vector the existing config tests
didn't capture; runs in the normal Go CI job (internal/api, go test ./...).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The top-level docs predated the console (#32), so a new user had no way to
discover it. Add a 'Web console' section to README.md and AGENTS.md: what it
does, how to launch it (docker compose --profile console up -d), where to open
it (console.localhost via the shared Traefik), and that it's a pure /v1 client.
Completes the 'basic docs' item for the console end-to-end experience.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Snapshot button captured a snapshot with no UI feedback. Now:
- success -> info toast 'Snapshot captured. History, restore, and fork are
  coming in v0.4.0.'
- 409 (running source) -> error toast 'Stop the sandbox before capturing a
  snapshot.'
The API client now attaches the HTTP status to thrown errors so the 409 case
is detected reliably. No snapshot history/restore/fork — that's v0.4.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The card badge was presence-based (green whenever a sandbox row existed), so a
sandbox in creating/stopped/error showed as green 'sandbox' — misleading. Now
each app's current-sandbox status is fetched and rendered via StatusBadge
(running=green, stopped/error distinct), and 'no sandbox' stays neutral. Honest
status, matching the detail view.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v0.4.0 backend on the public /v1/snapshots subsystem only (internal
/sandbox/{id}/... stays unexposed):
- migration 0015 adds snapshot.source_app_id so per-app history survives the
  ephemeral source sandbox; capture stamps it from the source's app_id.
- GET  /v1/apps/{id}/snapshots  — tenant+app-scoped history.
- POST /v1/apps/{id}/restore    — REPLACE the app's current sandbox from a
  snapshot (purge current, then clone). Destructive; console confirms.
- POST /v1/apps/{id}/fork       — new app + its sandbox spun from a snapshot;
  source app untouched.
Tenant scoping enforced on every path (cross-tenant app/snapshot -> 404).
Sandbox spin reuses the proven create path (template_path + .git reset).
Tests cover store scoping, history, and the restore/fork guard paths; the
Docker-dependent spin is verified on a real host, not CI. OpenAPI + contract
test updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se 4)

Adds a Snapshots panel to the app detail screen backed by the new
/v1/apps/{id} endpoints:
- history list (name, captured time, size) via GET /v1/apps/{id}/snapshots
- Restore: confirms (replaces the current sandbox, discards un-snapshotted
  work) then POST /v1/apps/{id}/restore and refreshes
- Fork: prompts for a name, POST /v1/apps/{id}/fork into a new app
The capture button now refreshes history and (since v0.4.0 ships these) drops
the 'coming soon' wording. Actions disabled unless the snapshot is ready.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…T verified

Records the release discipline + verification status for the Phase 4 preview
branch: capture/history/tenant-scoping/backend-orchestration are tested, but the
live restore/fork sandbox spin and preview are deliberately deferred to a real
isolated v0.4.0 deploy (they'd otherwise expose port-3000 sandboxes to prod
Traefik on the shared host). Not for merge to console/main; no non-draft PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/dev/install-v04-ubuntu.sh stands up the Phase 4 stack on a fresh Ubuntu
22.04/24.04 server, reusing the repo's docker-compose (traefik + sandboxd +
console profile) — no parallel deploy system. It installs Docker if missing,
fails if 80/443 are taken, detects the public IPv4, uses sslip.io for preview +
console URLs (HTTP on :80), writes .env + a docker-compose.override.yml, gates
the public console with Traefik basic auth (demo creds), keeps the API on
loopback (disables the edge api.yml router), builds images, starts the stack,
and prints URLs + teardown.

docs/v0.4.0-test-runbook.md: requirements, install, the 14-step create→preview→
snapshot→restore→fork checklist, TLS-as-follow-up, teardown, release discipline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 80/443)

Default to shared-host mode so the installer is safe next to Coolify/nginx/another
Traefik:
- HTTP_PORT=18080 (uncommon edge port; set HTTP_PORT=80 for dedicated-host mode)
- API_PORT=19090 on loopback only
- only the chosen HTTP_PORT must be free; fail clearly telling the user to set
  HTTP_PORT if taken; do NOT check or require 443 (TLS deferred)
- generated URLs include :<HTTP_PORT> unless it's 80
Runbook documents both modes plus 'behind an existing proxy': keep 18080 and have
the front proxy forward console.<ip>.sslip.io + *.preview.<ip>.sslip.io to
127.0.0.1:18080 (Host preserved); TLS via the front proxy or a real wildcard domain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On a shared host with HTTP_PORT=18080, the API returned bare preview URLs
(…sslip.io/) that hit whatever owns :80 (Coolify/front proxy) instead of
sandboxd's Traefik on :18080. previewURL() now appends the host port unless it's
the scheme default (80 for http, 443 for https):

  Server.PublicHTTPPort  <- SANDBOXD_PUBLIC_HTTP_PORT (main.go)
  docker-compose.yml     <- SANDBOXD_PUBLIC_HTTP_PORT: ${HTTP_PORT:-80}

The console iframe + open-in-tab link consume sb.preview.url unchanged, and
restore/fork responses use the same previewURL(), so all preview surfaces get the
corrected port. Unit-tested across http/https x default/custom ports; verified
live that GET /v1/sandboxes/{id} returns ...sslip.io:18080. No installer change
needed — it already writes HTTP_PORT, which compose forwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One append-only app_events table in the existing control-plane SQLite DB (no
ClickHouse/OTEL/Loki/separate DB), a centralized best-effort recorder, a
tenant-scoped paginated read API, and a console Activity timeline.

- migration 0016: app_events(id ULID, owner_token, app/sandbox/task/snapshot ids,
  type, severity, message, payload_json, created_at) + scoped indexes. ULID id
  doubles as the newest-first page cursor.
- internal/events: Recorder.Record (mirrors audit: own Store interface, detached
  ctx, never breaks the request); stable type/severity constants.
- store: InsertAppEvent + ListAppEvents/ListTaskEvents (owner_token-scoped,
  cursor-paginated); owner-agnostic GetApp for the background task path.
- API: GET /v1/apps/{id}/events and /v1/tasks/{id}/events (newest-first,
  default 50 / max 200, ?before cursor, next_before). Cross-tenant -> 404/empty.
- instrumented via the recorder (no scattered SQL): app.created/updated,
  config.created/updated/deleted (key only, never the secret), snapshot
  captured/capture.failed/restored/forked, sandbox create.started/failed/
  started/stopped/deleted, task.started, and on the task terminal point
  task.completed/failed/build.failed + preview.health.ok/failed.
- console: read-only Activity panel on app detail (time/severity/type/message +
  related ids), durable across refresh/restart.
- docs/openapi.yaml + contract test; .env.example notes the future
  SANDBOXD_EVENT_RETENTION_DAYS knob (retention deferred).

Tests: recorder writes valid JSON events; tenant scoping; pagination/limit;
config event carries the key but never the secret; failed task emits
task.failed + task.build.failed on both feeds. gofmt/vet/build/test + contract
test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ndexes

Follow-up to Phase 5 (3 fixes):
1. No raw output in app_events.payload_json. Task events now carry structured
   flags/reasons only — never BuildErrorMessage/PreviewErrorMessage/ErrorMessage
   text (which can echo secrets the app printed; the full text stays in the
   task's result.json). New payloads:
     task.completed       -> {files_changed, duration_ms, build_ok}
     task.failed          -> {failure_reason, has_error}
     task.build.failed    -> {reason:'build_failed', has_build_error:true}
     preview.health.failed-> {preview_status, has_preview_error}
   Test now plants a fake secret in all three error fields and asserts it never
   appears in any event's payload_json or message.
2. Monotonic ULID event ids (ulid.Monotonic under a mutex), so a same-millisecond
   completion burst sorts in emission order by id (the page cursor). Added a
   monotonic-ordering test.
3. Dropped the unused indexes (owner-only, sandbox-only, type-only) from
   migration 0016 — no endpoint queries them and each is write amplification on
   an append-only table. Kept idx_app_events_app + idx_app_events_task.

gofmt/vet/build/test + OpenAPI contract test green. Console untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se 7 core)

runtimed is no longer hardcoded to a single Vite dev server. An optional
workspace sandbox.yaml declares how the app builds/runs/previews/reports health,
plus background workers. No manifest = the built-in Vite defaults, so existing
apps are unchanged.

- cmd/runtimed/manifest.go: parse sandbox.yaml (gopkg.in/yaml.v3) with full
  backward-compatible defaults sourced from the existing RUNTIMED_* env vars;
  resolution rules for web / worker-only / empty / invalid. Unit-tested.
- cmd/runtimed/process.go: generalized the single dev-server supervisor into a
  reusable 'process' (web OR worker) with the same backoff/fast-fail/stop logic.
- main.go: builds the web process (if declared) + workers from the manifest,
  supervises all, probes the web health_path (Vite asset deep-probe kept only
  for the default app), and reports per-process status.
- protocol.go: Status.Processes []ProcessState (name/kind/running/pid/restarts);
  Preview still carries the web health for compatibility.
- build check uses the manifest's build.command/timeout (empty => skipped).
- docs/sandbox-manifest.md: schema reference + examples + security note (the
  manifest grants no new privilege; no Docker socket/compose/k8s).

New dependency: gopkg.in/yaml.v3 (pure-Go, CGO-safe) — the first YAML dep, needed
to parse the manifest. Deferred to follow-up slices: control-plane logs API +
console process status/logs panel + live image-rebuild e2e.

gofmt/vet/build/test green; runtimed manifest tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1. BuildSpec is never nil after LoadManifest (applyDefaults always sets it);
   added a defensive a.build!=nil guard at the build-check call site so a
   hand-built app{} can't panic. Test: absent/worker-only/invalid all yield a
   non-nil Build.
2. Worker-name validation: only [A-Za-z0-9_-] (1-64 chars) — rejects empty,
   path separators, '..', and duplicates (the name becomes ~/.runtimed/<name>.log,
   so this is path-safety). Dropped the silent auto-naming. Invalid manifest is
   rejected and falls back to the safe default (web app, no workers).
3. Port validation: an explicit web.port must be 1-65535 (0 = unset -> default).
4. Worker-only preview semantics: added PreviewNone ('none') instead of the
   misleading 'down', and documented it (non-breaking — new enum value).
5. Tests: build-never-nil, invalid/duplicate worker names rejected safely,
   invalid ports, worker-only status -> PreviewNone + no preview probe (no panic).
6. docs/sandbox-manifest.md: documents worker-only=none, the validation rules,
   and that Phase 7 is NOT fully verified until the rebuilt base image runs the
   default Vite, a custom web, and a worker-only manifest.

gofmt/vet/build/test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rebuilt sandboxd-base with the new runtimed (yaml dep fetched fine in the image
builder) and ran all three manifest shapes end-to-end on a disposable host stack:
- default Vite (no manifest): preview ready, web running, pnpm build exit 0
- custom web (python http.server :5000, health_path /healthz, build skipped):
  preview ready, serves /healthz and / with 200
- worker-only: preview 'none' (valid, not error), worker running + producing output
Test sandboxes were portless (no Traefik label) so prod routing was untouched.
Marks the manifest verification status as live-verified. No runtime fixes needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surfaces the runtime manifest process model through the public /v1 API and the
console (7A marked accepted/verified in docs).

- GET /v1/sandboxes/{id} now includes processes[] (name/kind/running/pid/
  restarts), mapped via a pure, testable v1RuntimeView helper. A worker-only app
  (preview status 'none') returns an empty endpoint URL — no fake preview.
- GET /v1/sandboxes/{id}/processes/{name}/logs: read-only tail of
  ~/.runtimed/<name>.log (which the files API refuses as a reserved subtree).
  Sandbox-scoped by id like the rest of the v1 sandbox API; process name strictly
  validated ([A-Za-z0-9_-], 1-64) so no path escapes; tail capped (default 200,
  max 1000, reads <=256KiB); no write/delete. Unknown process/sandbox -> 404,
  bad name -> 400.
- console: app detail shows a Processes panel (name/kind/status/pid/restarts +
  per-process recent logs); preview pane relabeled 'Preview / endpoint';
  worker-only renders 'No public endpoint — worker process running' (valid, not
  a failure).
- OpenAPI: documents the logs route + Sandbox.processes/Process schema.
- tests: process-logs tail, bad-name->400 (incl. traversal), unknown->404;
  v1RuntimeView worker-only shape (status none, no URL) + process mapping.

gofmt/vet/build/test + contract test green; console tsc + build green.
No presets, no manifest editor, no compose/kata/containerd (7B non-goals).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tastyeffectco and others added 16 commits June 23, 2026 19:45
…+ caps)

Users can create a working app of a common type from a preset that actually
boots. Approach (A): runtimed applies the preset on first boot.

- internal/preset: shared registry (single source of truth) of 5 presets —
  react-vite, nextjs, node-express, fastapi, worker — each with id/label/
  description/optional template/generated sandbox.yaml/required capabilities.
- runtime_preset on POST /v1/apps (stored on the app, migration 0017),
  POST /v1/apps/{id}/sandbox (explicit else app default; precedence over
  template), and POST /v1/sandboxes. Unknown preset -> 400. GET /v1/presets
  lists them (console picker source of truth).
- runtimed applies the preset on FIRST boot only: seed the preset template into
  an empty workspace, write sandbox.yaml only when missing — never overwrites
  existing app files or sandbox.yaml. Falls back to default template if unknown.
- minimal starter templates added (node-express-standard, fastapi-standard,
  nextjs-standard) — reliable boot, not fancy. React uses existing react-standard.
- console: app-type preset dropdown on New App (data-driven from /v1/presets);
  sandbox create inherits the app's preset.
- tests: every preset manifest validates with the loader; unknown->400 (app +
  sandbox create); app stores runtime_preset; resolve precedence; presets list;
  applyPreset/writeManifestIfMissing (write-once, never overwrite); seed skips
  non-empty workspace. gofmt/vet/build/test + contract test + console tsc green.

Image capability check: base image already has python3-venv (pip via venv works,
confirmed) + node/pnpm — so all 5 presets boot WITHOUT a Dockerfile change.
Live boot of all 5 on a rebuilt image is the deferred e2e step.

No Postgres/Redis/managed services, no compose, no manifest editor, no advanced
override, no provider work (7C-1 non-goals).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records the live e2e on rebuilt image sandboxd-base:p7c1 — all five presets
boot: react-vite ~31s, nextjs ~39s (warm; cold may be slower), node-express
~30s, fastapi ~37s (runtime venv/pip install works), worker ~28s (preview none +
worker running). Confirms: presets seed expected files, runtimed writes
sandbox.yaml, process status/logs endpoint works, API rejects unknown presets
with 400, runtimed logs loudly + falls back to react-standard on a bad preset
env. Notes still-unit-only items (console dropdown not browser-clicked;
app-default preset resolution not live-tested) and future image optimizations
(warm pnpm/npm cache, preinstalled FastAPI/uvicorn or uv, Next.js layer).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tance

The Next.js preset ran 'pnpm dev' (web) + 'pnpm build' (post-task build check) in
the same workspace; next build writes production .next/ that the long-running
next dev then serves from -> 500s on _next/static. Dev isn't restarted after the
build, so it stays broken.

Fix (smallest reliable):
- nextjs preset build.command is now empty: the build check is the only thing
  that runs 'next build', so skipping it removes the poison source. Tradeoff: no
  post-task build verification for Next.js until an isolated build check exists.
- web command 'rm -rf .next' before 'pnpm dev' defends a clean start against a
  stale/production .next carried in by snapshot restore (alone insufficient: dev
  isn't restarted post-build, hence skipping the check too).
- nextjs template ships .gitignore (node_modules,.next,out,.env,.env.local) so
  the git-based workspace checkpoint doesn't treat them as app changes.

Re-tested live (image sandboxd-base:p7c1b, portless): fresh ready ~30s, /+asset
200; reproduced bug (pnpm build -> 500); recovery via restart ~10s -> 200; edit
hot-reloads; checkpoint tracks only 6 real files. Real LLM agent task NOT run (no
ANTHROPIC_API_KEY) — verified the post-task build-check mechanism directly.

Tests: nextjs manifest has no 'pnpm build' + empty build + 'rm -rf .next';
template ships .gitignore; all preset manifests still validate.

Docs: 7C-1 NOT marked fully accepted; records the fix, the not-run agent path,
and follow-ups (snapshot .next bloat unfiltered; split task build_ok/preview_ok/
app_healthy; empty agent.log on timeout).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ist as deferred

Issue 2 (build/health semantics) — implemented:
- TaskResult adds build_status (passed|failed|skipped), preview_ok (*bool;
  omitted for worker-only), app_healthy (build-not-failed AND web preview serving
  / worker-only a worker running). build_ok kept for back-compat but now true
  ONLY when build_status=passed — a skipped build is never build_ok=true.
- runtimed sets build_status from the build check (empty command => skipped, not
  a fake pass) and derives app_healthy/preview_ok via postTaskHealth().
- console shows 'build skipped|passed|failed' (+ 'unhealthy') instead of the old
  unconditional 'build ok'. OpenAPI TaskResult updated.
- tests: postTaskHealth web (passed/skipped/down/failed) + worker-only (running/
  stopped); preview_ok omitempty in JSON.

Issue 1 (snapshot ignore-list) — deferred (not small), documented:
- capture is zstd of the raw loopback .img (block image), not a tree copy, so an
  ignore-list can't slot into capture without reflink-copy+loop-mount+prune or a
  filtered-tar redesign (mount risk for an RC). Correctness (stale .next in
  forks/restores) is already handled by 'rm -rf .next' on dev start; only size
  bloat remains. Recorded as a scoped follow-up.

gofmt/vet/build/test + contract test green; console tsc + build green.
No merge to console/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…copy

Storage model (clarified): the OSS build stores each workspace as a plain
bind-mounted DIRECTORY (loopback .img is legacy; internal/snapshot's zstd-.img
path is dead in dir mode). The public /v1 snapshot/fork/restore subsystem
captures by copying that directory (captureImage), so the ignore-list belongs in
that copy — no image/zstd step involved.

captureImage now uses copyTreeExcluding instead of 'cp -a':
- skips node_modules/.next/out/.venv/__pycache__/.cache by base name at any depth
  (conservative; dist/build NOT ignored — templates don't treat them as generated);
- copies symlinks verbatim and never follows them (no path-traversal/symlink
  escape during staging copy);
- preserves mode + ownership (lchown) so the restore cp -a keeps files writable.

Restore/fork unchanged (still cp -a the snapshot dir); restored workspaces
re-create deps on first boot. Tests: exclusion incl. nested node_modules, source
+ sandbox.yaml preserved, symlink-escape safety; existing snapshot/fork/restore
guard tests still pass. gofmt/vet/build/test + contract green. No merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se-blocker)

Bug: applyDefaults replaced an explicit empty build.command with the default
pnpm build, so presets could NOT disable the post-task build check. Next.js
therefore still ran 'next build' after tasks and poisoned the live 'next dev'
(.next/ -> 500 on _next/static); FastAPI/worker could get false/irrelevant
checks.

Fix: BuildSpec.Command is now *string so unset (nil) is distinct from explicit
empty (&""):
- no build block / build: {}  -> default pnpm build (backward compatible)
- build.command: ""           -> SKIP the build check
- build.command: "x"          -> run x
task.go derefs the command; empty => skipped (build_status=skipped, build_ok
false). worker preset now sets build.command "" explicitly (was falling back to
the default).

Build checks are runtime verification (does the app still build/start after a
task), NOT production deployment builds — documented, with how to skip.

Acceptance tests (TestManifestBuildResolution): absent->default, {}->default,
""->skip, "x"->run; manifest assertions updated for *string.

Live retest (image sandboxd-base:p7c1d, portless): ran a coding task on a Next.js
sandbox — agent fails w/o creds but the post-task build check runs; result
build_status=skipped, NO pnpm/next build executed, / + four _next/static/chunks/*
return 200 (not poisoned). FastAPI: build_status=skipped, no pnpm build, /health
200. Cleaned up; prod untouched.

gofmt/vet/build/test + contract green. No merge to console/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gent builds

New finding: even with the platform build check skipped, the coding AGENT can run
'next build' during a task, writing production .next/ that the live 'next dev'
then serves -> 404/500 on _next/static. rm -rf .next at boot can't undo a
mid-session poison because dev isn't restarted after the task.

Fix: add web.restart_after_task (manifest WebProc field, default false). When
true, runtimed restarts the web process after every task (skipped on cancel) via
the existing supervisor (stop() -> re-run start command), then waits up to 90s
for a 200 on the health path before reporting preview/health. The web command
re-runs, so its 'rm -rf .next' wipes the agent's production build and dev comes
back clean. Next.js preset sets restart_after_task: true; other stacks leave it
false.

Live retest (image sandboxd-base:p7c1e, portless): created Next.js sandbox,
simulated the agent by running 'pnpm build' mid-session (.next became a
production build w/ BUILD_ID; /_next/static/chunks/* -> 404), then ran a coding
task. After the task: web restarts=1, / + four chunk assets -> 200, .next is dev
again (no BUILD_ID). Recovery confirmed. Cleaned up; prod untouched.

Tests: manifest parses web.restart_after_task (default false); nextjs preset sets
it true. gofmt/vet/build/test green. Docs: 'Dev-mode resilience' section. No merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ProvisionFromTemplate (the single funnel for app fork, app restore, and direct
from_snapshot creation) cloned the snapshot/template with 'cp -a' (preserves
SOURCE ownership) and created the workspace $HOME dir as root — so the sandbox
user (uid 1000) hit EACCES writing ~/.cache, pnpm/npm store, node_modules,
.next, .venv, generated files. Fresh seed already chowns; the template path did
not.

Fix: after the clone, ProvisionFromTemplate recursively normalizes ownership of
the whole workspace (incl. the $HOME dir itself) to the sandbox uid/gid (1000)
— same result as the fresh seed's chown -R, never trusting the snapshot's
captured ownership. Uses Lchown via a WalkDir lstat walk that never descends a
symlink, so a symlink pointing outside the workspace can't redirect the chown.
Logged via the loopback Manager logger. Host-side numeric chown is correct for
the OSS --userns=host default.

Tests (internal/loopback): chowns the whole tree incl. root dir; symlink-escape
not followed; ProvisionFromTemplate normalizes (covers fork/restore/from_snapshot
— all route through it).

Live retest (image p7c1f, all portless -> no Traefik label -> no prod collision):
- Next.js snapshot -> fork: $HOME 1000:1000, ready 31s, / + chunks 200,
  node_modules reinstalled, ~/.cache writable (no EACCES).
- FastAPI fork: $HOME 1000:1000, ready 15s, /health 200, venv reinstalled.
Cleaned up; prod untouched.

gofmt/vet/build/test green. No merge to console/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two FastAPI bugs:
1. Port mismatch — the public preview routes to 3000 but uvicorn ran on 8000, so
   external preview was 502 while the internal probe said ready.
2. Stale after task — uvicorn didn't reload, so an agent-added route 404'd until a
   manual restart.

Fix: FastAPI preset manifest now runs
'.venv/bin/uvicorn main:app --host 0.0.0.0 --port 3000 --reload' with
web.port: 3000 (health_path /health, build still skipped). Template
requirements.txt adds watchfiles so --reload works. No restart_after_task —
reload is reliable (kept for Next.js only).

Tests: fastapi preset validates, web.port 3000, command has --reload + --port
3000, no 8000, no pnpm build; existing preset tests pass.

Live retest (image p7c1g, portless; external check via container-IP:3000 — the
path Traefik connects on — to stay off the host Traefik): create -> public
/health 200 (was 502); agent adds /hello -> /hello 200 with NO manual restart
(watchfiles reload, 2 reload lines in web.log); stop -> snapshot -> fork ->
fork $HOME 1000:1000, public /health 200, /hello preserved, venv reinstalled.
Cleaned up; prod untouched.

gofmt/vet/build/test green. No merge to console/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two preset reload gaps (neither runtime has live reload):
- node-express: 'node server.js' didn't reload -> agent-added routes 404 until
  manual restart. Set web.restart_after_task: true.
- worker: a long-running worker kept old behavior after a task. Extended the
  restart_after_task mechanism from web to WORKER processes (per-process flag on
  the process struct; workers are bounced without a readiness wait, web keeps its
  health-wait). worker preset now ships an editable worker.sh (template
  worker-standard, command 'bash worker.sh') with the worker's
  restart_after_task: true, so edits to worker.sh take effect on restart. Not a
  generic process-policy framework — just the per-process flag.

Manifest: Worker.RestartAfterTask added; process.restartAfterTask field; main
wires web + each worker from the manifest; runTask restarts flagged web (with
health wait) and flagged workers (stop -> supervisor re-runs).

Tests: manifest parses worker.restart_after_task (default false); node-express +
worker presets set restart_after_task; react-vite/fastapi do NOT (live reload),
nextjs unchanged; all preset manifests validate. full go test + console
build/typecheck green.

Live retest (image p7c1h, portless): node-express agent adds /ping -> pong 200
after task, no manual restart (restarts 1); worker output heartbeat->PONG-BEAT in
worker.log after task, no manual restart (restarts 1). React/FastAPI/Next
unchanged. Cleaned up; prod untouched.

Docs: reload-mechanism table + documented (not implemented) follow-ups: v1 vs
legacy DELETE workspace semantics, keepalive_until missing from v1 GET, warming
interstitial returns 200.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Marks 7C-1 as accepted & live-verified. All five presets pass preview + agent-
task reload + fork/restore:
- react-vite: task applies, preview 200, fork healthy
- nextjs: build-provoking task no longer poisons preview (restart_after_task),
  chunks 200, fork healthy
- fastapi: port 3000 + uvicorn --reload, task changes live, fork healthy
- node-express: restart_after_task, added route live after task, fork healthy
- worker: restart_after_task, code/output change reflected in logs, fork healthy
Cross-cutting verified: snapshot ignore-list, fork/restore ownership normalized
to sandbox:sandbox, wake/idle/keepalive edge cases.

Remaining items recorded as non-blocking follow-ups: v1 DELETE purges vs legacy
DELETE keeps workspace; keepalive_until missing from v1 GET; warming interstitial
returns 200; per-task agent.log persistence on timeout. Phase-status banner +
section header updated. Not merged to console/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Integration of all accepted v0.3 + v0.4 work (console/app UI, config & secrets,
snapshots/fork/restore, observability/events, runtime manifest, process API/logs,
five runtime presets, shared-host installer/preview port, OpenAPI updates).

- CHANGELOG: v0.4.0 entry (Added/Fixed/Known limitations).
- README: runtime presets + sandbox.yaml section; Known limitations (v0.4.0).
- ARCHITECTURE: runtimed manifest/process model + presets note.
- console/README: app detail (preview/endpoint, config, snapshots, activity,
  processes) + New-App preset picker.

Known limitations recorded (non-blocking): v1 DELETE purge vs legacy keep;
keepalive_until not surfaced in v1 GET; warming interstitial returns 200;
per-task agent.log persistence on timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sandboxd (the contract — primary):
- TestPublicAPISurfacePresent pins the required public /v1 surface in BOTH
  api.go routes and openapi.yaml (parity alone can't catch both-sides deletion):
  presets, apps, app sandbox, sandbox GET, process logs, snapshots + app
  snapshot/fork/restore, app config, app events.
- v1_client_contract_test: pins the JSON SHAPES the console/SDK consume
  (GET /v1/sandboxes/{id}: id/status/preview{url,status}/processes/template;
  GET /v1/presets: id/label/description). If the server shape changes, this
  fails first and the client fixtures must follow.
  (Existing suites already cover config redaction, events, process logs,
  snapshots/fork guards, runtime-view, build/health semantics.)

console (a client — lightweight smoke, fixtures mirror sandbox d responses):
- vitest + jsdom + @testing-library/react (lockfile regenerated for frozen install).
- src/test/fixtures.ts: apps, 5 presets, web + worker sandbox runtime/processes,
  events, redacted config — shaped to match the Go contract tests.
- App.test: console loads, app list renders, preset dropdown populated from
  /v1/presets. AppDetail.test: preview/processes/activity/config sections render;
  config redaction (no plaintext secret); worker/no-preview renders as valid;
  Delete control wording surfaced.

docs/release-checklist.md: manual VPS smoke (install, React/Next/FastAPI/express/
worker tasks, snapshot/fork, process logs, events, config redaction, idle/wake/
keepalive) + known limitations.

Gate: go test + OpenAPI contract + console tsc/build + console vitest all green.
No new product features; not merged to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reset test, unhealthy fixture)

1. CI: the console job now runs pnpm install --frozen-lockfile, tsc --noEmit,
   pnpm test (vitest), and pnpm build directly (clear output), keeping the docker
   image build as an extra check. Adds release/** to push triggers (PRs already
   run via pull_request).
2. Console delete wording: v1 DELETE purges the workspace, so the button now reads
   'Delete sandbox and workspace' and requires a window.confirm that spells out
   the workspace (code/deps/generated files) is permanently removed. Test updated
   to assert the wording.
3. Preset shape test: assert the REQUIRED preset ids exist with id/label/
   description instead of an exact count, so adding a preset later doesn't break it.
4. Unhealthy sandbox fixture + test: stopped sandbox, preview down, web process
   not running (restarts=3). Asserts the UI shows no live preview iframe, renders
   'Sandbox not running', and the process row reads 'stopped' (never 'running').

Gate: gofmt/vet/build/go test + OpenAPI contract + console tsc + vitest (7/7) +
console build, all green. No new features; not merged to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sandboxd (the contract):
- GET /v1/settings: read-only, tokenless-safe instance summary — version/build,
  networking (preview domain/base/port/tls/entrypoint), auth.enabled, runtime
  (storage mode, base image), lifecycle (idle reap + threshold, keepalive max),
  egress mode, agent providers, runtime presets, and capability flags
  (snapshots/config_secrets/templates/forward_auth). Server.Instance holds the
  static safe metadata; populated in main (version, auth !Disabled, storage
  'directory', egress mode, providers ['opencode'], idle settings). NEVER emits
  secrets/tokens/keys/env values.
- OpenAPI documents /v1/settings; added to required-public-surface test.
- Tests: TestV1SettingsShapeAndNoSecretLeak builds a real cipher from a known key
  and asserts the response shape is stable AND the key never appears; capability
  flags + preview_base + presets verified. Contract parity green.

console (a client):
- api.ts Settings type + getSettings(); Settings.tsx read-only page with sections
  System / Networking / Runtime & presets / Agents / Security-auth / Egress /
  Capabilities; topbar Settings nav.
- fixtures.settingsFixture mirrors the API shape; Settings.test.tsx renders all
  sections and asserts auth shows a mode (not a token) and no secret/password
  field appears.

Non-goals respected: read-only (no config editing), no RBAC, no tenant mgmt, no
external secret managers, no daemon restart. Gate green: gofmt/vet/build/go test
+ OpenAPI contract + console tsc/vitest(9/9)/build. Not merged to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sted, hot-applied)

Only the lifecycle tunables (idle_reap_enabled, idle_threshold_seconds,
keepalive_max_seconds) are editable from the console; everything else stays
read-only / env-managed (auth, secrets, egress, networking are never editable).

sandboxd (contract):
- PATCH /v1/settings: STRICT allowlist via json DisallowUnknownFields — any other
  key (auth/egress/networking/secrets/version/unknown) -> 400, no mutation.
  Range-validated (idle 60..86400s, keepalive 0..7d). Persists to a singleton
  instance_settings row (migration 0018) and HOT-APPLIES via a shared
  instancecfg.Live the idle reaper + keepalive path read each use; audited
  (settings.update). 503 when Live/Store unwired.
- GET /v1/settings now reads lifecycle live + advertises an "editable" list.
- reaper.Idle gains ThresholdFn/EnabledFn (live overrides; nil = static). main
  builds instancecfg.Live from env defaults, overlays the persisted row at boot,
  wires it to the reaper + Server. keepalive handler reads s.keepaliveMax() live.
- Tests: reject protected/unknown keys (no mutation/persist), validation bounds,
  persist+hot-apply round-trip, 503 without Live, no-secret-leak (existing).
  OpenAPI + contract + required-surface updated.

console (client):
- Settings page: editable Lifecycle section (gated on the server "editable" list)
  with Save -> PATCH; all other sections read-only (no inputs). api.patchSettings.
- Tests: lifecycle inputs present + read-only sections have no inputs; Save sends
  a PATCH carrying only the lifecycle object. vitest 11/11.

Non-goals respected: no RBAC, no tenant mgmt, no external secret managers, no
daemon restart, no editing of protected/restart-required settings. Gate green:
gofmt/vet/build/go test + OpenAPI contract + console tsc/vitest/build. Not merged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tastyeffectco tastyeffectco force-pushed the release/v0.4-apps-console branch from 22d2256 to 37f06c2 Compare June 24, 2026 21:47
tastyeffectco and others added 2 commits June 25, 2026 10:35
Defines the base image / runtime preset / starter-import layering and the
contract a custom base image must meet to work with sandboxd:
- runtimed as the container main process
- sandbox user uid/gid 1000:1000
- workspace /home/sandbox, app /home/sandbox/workspace/app
- optional /opt/sandbox-skel seed + /opt/templates/<preset>
- unprivileged: no privileged, no Docker socket, no extra caps, no writable-rootfs
  assumption
- must provide the toolchains the presets in use require
- selected instance-wide via SANDBOXD_IMAGE (read-only in Settings)

Includes footguns (missing runtimed / wrong uid / wrong workspace / missing
toolchain / requires Docker-privileged) and a roadmap (startup preflight, image
profiles, browser image, app-level selection — later; no Dockerfile builder /
Compose / registry creds).

No code, no Settings UI change, no image profiles, no app-level selection. On
release/v0.4-apps-console (PR #35 stays draft); release docs untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…notes

- README: reposition as "Self-hosted control plane for AI-built apps"; add an
  above-the-fold "What you get" list (web console, runtime presets, live preview
  URLs, agent tasks, app config/secrets, snapshots/fork/restore, activity/events,
  process logs, settings/lifecycle); extend Known limitations (Docker-only; no
  Compose/local DB/Git import yet).
- CHANGELOG: cut [0.4.0] — 2026-06-25 (was Unreleased); add Settings (GET/PATCH
  lifecycle), base-image contract, and test-foundation entries; honest known
  limitations (v1 delete purge, keepalive_until not surfaced, warming page 200,
  Docker-only, no Compose/local DB/Git import); note it rolls up the v0.3.0 work.
- docs/release-notes-v0.4.0.md: draft GitHub release notes.
- installer: default base tag sandboxd-base:0.4.0-test -> :0.4.0 (drop confusing
  "test" language at launch; SANDBOXD_IMAGE override unchanged).

No feature/runtime changes. Gate green: gofmt/vet/build/go test + OpenAPI
contract + installer syntax + console tsc/test(11)/build. Validated by the
from-zero VPS RC QA pass. PR #35 to be marked ready (not merged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tastyeffectco tastyeffectco marked this pull request as ready for review June 25, 2026 19:52
@tastyeffectco

Copy link
Copy Markdown
Owner Author

RC QA: PASS — a from-zero VPS install + the docs/release-checklist.md smoke (console, presets, previews, agent tasks, snapshots/fork, process logs, events, config/secrets redaction, settings/lifecycle, idle/wake/keepalive) passed on a fresh Ubuntu host. Full automated gate green (Go test + OpenAPI contract + console tsc/test/build). Release-docs pass done (README positioning, CHANGELOG 0.4.0, release notes; installer base tag de--test-ed). Marked ready for review — do not merge until approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant