draft: v0.4.0 — apps console, snapshots/fork, observability, runtime manifest + presets#35
Open
tastyeffectco wants to merge 48 commits into
Open
draft: v0.4.0 — apps console, snapshots/fork, observability, runtime manifest + presets#35tastyeffectco wants to merge 48 commits into
tastyeffectco wants to merge 48 commits into
Conversation
An optional web console for managing apps on top of sandboxd, plus the
versioned /v1 contract it binds to. Built as a folder in the monorepo
(API-only boundary) so it splits cleanly to its own repo once /v1
stabilizes.
- docs/openapi.yaml — the public /v1 API as OpenAPI 3.0 (apps,
sandboxes, tasks/SSE, snapshots). The contract for the console and
future integrations.
- console/ — Vite + React SPA (shadcn/Vercel-style dark UI). Talks ONLY
to /v1 (no Go imports, no DB, no workspace access): app list, app
detail with live preview iframe, task submit + live SSE logs,
start/stop/snapshot/delete. Playwright specs for the lifecycle.
- POST /v1/sandboxes/{id}/start — public counterpart of /stop so an
API-only console need not reach the internal wake path.
- previewURL now reflects PreviewTLS (http on a default local deploy),
so the preview iframe loads without TLS.
- Packaging: `docker compose --profile console up` builds the SPA and
serves it from nginx, proxying /v1 to sandboxd (deferred DNS so it
survives sandboxd restarts). Core mode (no profile) is unchanged.
- CI: a console job builds the SPA + image so a TS/build break is caught.
MVP scope: single-user, auth-off, public previews. Verified: Go +
console SPA + console image all build; nginx config valid; compose
console profile resolves; go vet/test green. The browser-driven
Playwright run against the full live stack is the remaining check.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…main>) Per review: the same Traefik that serves previews routes the console too, selected by Host header (console.<domain> -> console, *.preview.<domain> -> sandboxes) on one entrypoint — instead of a separate published port. Gated on the sandboxd.managed=true label the docker provider requires. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fails if any /v1 route is missing from docs/openapi.yaml, or the spec documents an endpoint that no route serves — the contract the console and external integrations depend on.
Adds control-plane-owned config and secrets per app, so sensitive values
never live in Docker env, workspace files, or task logs.
- New app_config table (migration 0014), scoped to an app (and so to the
API tenant). Sensitive entries store AES-256-GCM ciphertext + a random
per-value nonce; non-sensitive entries may keep a plaintext value.
- Encryption uses standard-library crypto only. The master key comes
from SANDBOXD_SECRETS_KEY (base64, 32 bytes) or an auto-generated 0600
keyfile under the data dir.
- API: POST/GET/PATCH/DELETE /v1/apps/{id}/config. Sensitive values are
write-only — GET returns metadata only (key, sensitive, access_policy,
value_set, timestamps), never the plaintext. Non-sensitive config may
be returned in full.
- access_policy metadata (control_plane_only | agent_access |
runtime_access | both); default control_plane_only. Agent/runtime
delivery is the next slice (a scoped-token broker) — for now nothing is
injected, so secrets stay in the control plane.
- Plaintext is never logged; audit entries record only the key.
Secrets are deliberately NOT passed through `docker run -e`. The legacy
create-time env stays for backwards compatibility, but app_config is the
safer managed replacement.
Tests: encryption round-trip / fresh nonce / tamper-detection / 0600
keyfile; sensitive value encrypted at rest and never returned; redaction;
default policy; tenant/app scoping; sensitivity toggle re-encrypts.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…only Covers two acceptance checks: a PATCH that omits 'value' must not alter the stored secret bytes, and audit entries record the config key but never the plaintext value.
… changelog
Adds the four /v1/apps/{id}/config routes (and ConfigItem/CreateConfigRequest/
PatchConfigRequest schemas) to docs/openapi.yaml so the console<->sandboxd
contract test stays green after merging #33. Bumps the spec to 0.3.0 and adds
an Unreleased changelog entry for the v0.3.0 integration.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a Config & Secrets section to the app detail view backed by the new
/v1/apps/{id}/config API: list entries, add a key (secret or plaintext) with an
access policy, change an entry's access policy inline, and delete. Secrets are
write-only end to end — the API never returns a sensitive value, so a stored
secret shows as a redacted '•••• set' chip and can only be replaced.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The secrets broker (Slice 2) is not implemented, so only control_plane_only is enforced. Mark agent_access/runtime_access/both as 'reserved (broker)' and disable them in both selects so the UI never implies a secret is delivered to an agent or app runtime. An existing entry already set to a reserved policy still displays it. Adds a per-row Replace (secret) / Edit (plaintext) action: since sensitive values are write-only, this PATCHes a new value inline — the only way to rotate a stored secret from the console. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds TestAppConfigSecretDoesNotLeak: one lifecycle test asserting a sensitive value (sk-test-secret-ci) never escapes through any of the four leak vectors — API responses, DB plaintext columns, audit rows, or server logs — while a non-sensitive value still round-trips plainly and the default policy stays control_plane_only. Closes the log-output vector the existing config tests didn't capture; runs in the normal Go CI job (internal/api, go test ./...). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The top-level docs predated the console (#32), so a new user had no way to discover it. Add a 'Web console' section to README.md and AGENTS.md: what it does, how to launch it (docker compose --profile console up -d), where to open it (console.localhost via the shared Traefik), and that it's a pure /v1 client. Completes the 'basic docs' item for the console end-to-end experience. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Snapshot button captured a snapshot with no UI feedback. Now: - success -> info toast 'Snapshot captured. History, restore, and fork are coming in v0.4.0.' - 409 (running source) -> error toast 'Stop the sandbox before capturing a snapshot.' The API client now attaches the HTTP status to thrown errors so the 409 case is detected reliably. No snapshot history/restore/fork — that's v0.4.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The card badge was presence-based (green whenever a sandbox row existed), so a sandbox in creating/stopped/error showed as green 'sandbox' — misleading. Now each app's current-sandbox status is fetched and rendered via StatusBadge (running=green, stopped/error distinct), and 'no sandbox' stays neutral. Honest status, matching the detail view. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v0.4.0 backend on the public /v1/snapshots subsystem only (internal
/sandbox/{id}/... stays unexposed):
- migration 0015 adds snapshot.source_app_id so per-app history survives the
ephemeral source sandbox; capture stamps it from the source's app_id.
- GET /v1/apps/{id}/snapshots — tenant+app-scoped history.
- POST /v1/apps/{id}/restore — REPLACE the app's current sandbox from a
snapshot (purge current, then clone). Destructive; console confirms.
- POST /v1/apps/{id}/fork — new app + its sandbox spun from a snapshot;
source app untouched.
Tenant scoping enforced on every path (cross-tenant app/snapshot -> 404).
Sandbox spin reuses the proven create path (template_path + .git reset).
Tests cover store scoping, history, and the restore/fork guard paths; the
Docker-dependent spin is verified on a real host, not CI. OpenAPI + contract
test updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se 4)
Adds a Snapshots panel to the app detail screen backed by the new
/v1/apps/{id} endpoints:
- history list (name, captured time, size) via GET /v1/apps/{id}/snapshots
- Restore: confirms (replaces the current sandbox, discards un-snapshotted
work) then POST /v1/apps/{id}/restore and refreshes
- Fork: prompts for a name, POST /v1/apps/{id}/fork into a new app
The capture button now refreshes history and (since v0.4.0 ships these) drops
the 'coming soon' wording. Actions disabled unless the snapshot is ready.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…T verified Records the release discipline + verification status for the Phase 4 preview branch: capture/history/tenant-scoping/backend-orchestration are tested, but the live restore/fork sandbox spin and preview are deliberately deferred to a real isolated v0.4.0 deploy (they'd otherwise expose port-3000 sandboxes to prod Traefik on the shared host). Not for merge to console/main; no non-draft PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/dev/install-v04-ubuntu.sh stands up the Phase 4 stack on a fresh Ubuntu 22.04/24.04 server, reusing the repo's docker-compose (traefik + sandboxd + console profile) — no parallel deploy system. It installs Docker if missing, fails if 80/443 are taken, detects the public IPv4, uses sslip.io for preview + console URLs (HTTP on :80), writes .env + a docker-compose.override.yml, gates the public console with Traefik basic auth (demo creds), keeps the API on loopback (disables the edge api.yml router), builds images, starts the stack, and prints URLs + teardown. docs/v0.4.0-test-runbook.md: requirements, install, the 14-step create→preview→ snapshot→restore→fork checklist, TLS-as-follow-up, teardown, release discipline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 80/443) Default to shared-host mode so the installer is safe next to Coolify/nginx/another Traefik: - HTTP_PORT=18080 (uncommon edge port; set HTTP_PORT=80 for dedicated-host mode) - API_PORT=19090 on loopback only - only the chosen HTTP_PORT must be free; fail clearly telling the user to set HTTP_PORT if taken; do NOT check or require 443 (TLS deferred) - generated URLs include :<HTTP_PORT> unless it's 80 Runbook documents both modes plus 'behind an existing proxy': keep 18080 and have the front proxy forward console.<ip>.sslip.io + *.preview.<ip>.sslip.io to 127.0.0.1:18080 (Host preserved); TLS via the front proxy or a real wildcard domain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On a shared host with HTTP_PORT=18080, the API returned bare preview URLs
(…sslip.io/) that hit whatever owns :80 (Coolify/front proxy) instead of
sandboxd's Traefik on :18080. previewURL() now appends the host port unless it's
the scheme default (80 for http, 443 for https):
Server.PublicHTTPPort <- SANDBOXD_PUBLIC_HTTP_PORT (main.go)
docker-compose.yml <- SANDBOXD_PUBLIC_HTTP_PORT: ${HTTP_PORT:-80}
The console iframe + open-in-tab link consume sb.preview.url unchanged, and
restore/fork responses use the same previewURL(), so all preview surfaces get the
corrected port. Unit-tested across http/https x default/custom ports; verified
live that GET /v1/sandboxes/{id} returns ...sslip.io:18080. No installer change
needed — it already writes HTTP_PORT, which compose forwards.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One append-only app_events table in the existing control-plane SQLite DB (no
ClickHouse/OTEL/Loki/separate DB), a centralized best-effort recorder, a
tenant-scoped paginated read API, and a console Activity timeline.
- migration 0016: app_events(id ULID, owner_token, app/sandbox/task/snapshot ids,
type, severity, message, payload_json, created_at) + scoped indexes. ULID id
doubles as the newest-first page cursor.
- internal/events: Recorder.Record (mirrors audit: own Store interface, detached
ctx, never breaks the request); stable type/severity constants.
- store: InsertAppEvent + ListAppEvents/ListTaskEvents (owner_token-scoped,
cursor-paginated); owner-agnostic GetApp for the background task path.
- API: GET /v1/apps/{id}/events and /v1/tasks/{id}/events (newest-first,
default 50 / max 200, ?before cursor, next_before). Cross-tenant -> 404/empty.
- instrumented via the recorder (no scattered SQL): app.created/updated,
config.created/updated/deleted (key only, never the secret), snapshot
captured/capture.failed/restored/forked, sandbox create.started/failed/
started/stopped/deleted, task.started, and on the task terminal point
task.completed/failed/build.failed + preview.health.ok/failed.
- console: read-only Activity panel on app detail (time/severity/type/message +
related ids), durable across refresh/restart.
- docs/openapi.yaml + contract test; .env.example notes the future
SANDBOXD_EVENT_RETENTION_DAYS knob (retention deferred).
Tests: recorder writes valid JSON events; tenant scoping; pagination/limit;
config event carries the key but never the secret; failed task emits
task.failed + task.build.failed on both feeds. gofmt/vet/build/test + contract
test green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ndexes
Follow-up to Phase 5 (3 fixes):
1. No raw output in app_events.payload_json. Task events now carry structured
flags/reasons only — never BuildErrorMessage/PreviewErrorMessage/ErrorMessage
text (which can echo secrets the app printed; the full text stays in the
task's result.json). New payloads:
task.completed -> {files_changed, duration_ms, build_ok}
task.failed -> {failure_reason, has_error}
task.build.failed -> {reason:'build_failed', has_build_error:true}
preview.health.failed-> {preview_status, has_preview_error}
Test now plants a fake secret in all three error fields and asserts it never
appears in any event's payload_json or message.
2. Monotonic ULID event ids (ulid.Monotonic under a mutex), so a same-millisecond
completion burst sorts in emission order by id (the page cursor). Added a
monotonic-ordering test.
3. Dropped the unused indexes (owner-only, sandbox-only, type-only) from
migration 0016 — no endpoint queries them and each is write amplification on
an append-only table. Kept idx_app_events_app + idx_app_events_task.
gofmt/vet/build/test + OpenAPI contract test green. Console untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se 7 core) runtimed is no longer hardcoded to a single Vite dev server. An optional workspace sandbox.yaml declares how the app builds/runs/previews/reports health, plus background workers. No manifest = the built-in Vite defaults, so existing apps are unchanged. - cmd/runtimed/manifest.go: parse sandbox.yaml (gopkg.in/yaml.v3) with full backward-compatible defaults sourced from the existing RUNTIMED_* env vars; resolution rules for web / worker-only / empty / invalid. Unit-tested. - cmd/runtimed/process.go: generalized the single dev-server supervisor into a reusable 'process' (web OR worker) with the same backoff/fast-fail/stop logic. - main.go: builds the web process (if declared) + workers from the manifest, supervises all, probes the web health_path (Vite asset deep-probe kept only for the default app), and reports per-process status. - protocol.go: Status.Processes []ProcessState (name/kind/running/pid/restarts); Preview still carries the web health for compatibility. - build check uses the manifest's build.command/timeout (empty => skipped). - docs/sandbox-manifest.md: schema reference + examples + security note (the manifest grants no new privilege; no Docker socket/compose/k8s). New dependency: gopkg.in/yaml.v3 (pure-Go, CGO-safe) — the first YAML dep, needed to parse the manifest. Deferred to follow-up slices: control-plane logs API + console process status/logs panel + live image-rebuild e2e. gofmt/vet/build/test green; runtimed manifest tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1. BuildSpec is never nil after LoadManifest (applyDefaults always sets it);
added a defensive a.build!=nil guard at the build-check call site so a
hand-built app{} can't panic. Test: absent/worker-only/invalid all yield a
non-nil Build.
2. Worker-name validation: only [A-Za-z0-9_-] (1-64 chars) — rejects empty,
path separators, '..', and duplicates (the name becomes ~/.runtimed/<name>.log,
so this is path-safety). Dropped the silent auto-naming. Invalid manifest is
rejected and falls back to the safe default (web app, no workers).
3. Port validation: an explicit web.port must be 1-65535 (0 = unset -> default).
4. Worker-only preview semantics: added PreviewNone ('none') instead of the
misleading 'down', and documented it (non-breaking — new enum value).
5. Tests: build-never-nil, invalid/duplicate worker names rejected safely,
invalid ports, worker-only status -> PreviewNone + no preview probe (no panic).
6. docs/sandbox-manifest.md: documents worker-only=none, the validation rules,
and that Phase 7 is NOT fully verified until the rebuilt base image runs the
default Vite, a custom web, and a worker-only manifest.
gofmt/vet/build/test green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rebuilt sandboxd-base with the new runtimed (yaml dep fetched fine in the image builder) and ran all three manifest shapes end-to-end on a disposable host stack: - default Vite (no manifest): preview ready, web running, pnpm build exit 0 - custom web (python http.server :5000, health_path /healthz, build skipped): preview ready, serves /healthz and / with 200 - worker-only: preview 'none' (valid, not error), worker running + producing output Test sandboxes were portless (no Traefik label) so prod routing was untouched. Marks the manifest verification status as live-verified. No runtime fixes needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surfaces the runtime manifest process model through the public /v1 API and the
console (7A marked accepted/verified in docs).
- GET /v1/sandboxes/{id} now includes processes[] (name/kind/running/pid/
restarts), mapped via a pure, testable v1RuntimeView helper. A worker-only app
(preview status 'none') returns an empty endpoint URL — no fake preview.
- GET /v1/sandboxes/{id}/processes/{name}/logs: read-only tail of
~/.runtimed/<name>.log (which the files API refuses as a reserved subtree).
Sandbox-scoped by id like the rest of the v1 sandbox API; process name strictly
validated ([A-Za-z0-9_-], 1-64) so no path escapes; tail capped (default 200,
max 1000, reads <=256KiB); no write/delete. Unknown process/sandbox -> 404,
bad name -> 400.
- console: app detail shows a Processes panel (name/kind/status/pid/restarts +
per-process recent logs); preview pane relabeled 'Preview / endpoint';
worker-only renders 'No public endpoint — worker process running' (valid, not
a failure).
- OpenAPI: documents the logs route + Sandbox.processes/Process schema.
- tests: process-logs tail, bad-name->400 (incl. traversal), unknown->404;
v1RuntimeView worker-only shape (status none, no URL) + process mapping.
gofmt/vet/build/test + contract test green; console tsc + build green.
No presets, no manifest editor, no compose/kata/containerd (7B non-goals).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ caps)
Users can create a working app of a common type from a preset that actually
boots. Approach (A): runtimed applies the preset on first boot.
- internal/preset: shared registry (single source of truth) of 5 presets —
react-vite, nextjs, node-express, fastapi, worker — each with id/label/
description/optional template/generated sandbox.yaml/required capabilities.
- runtime_preset on POST /v1/apps (stored on the app, migration 0017),
POST /v1/apps/{id}/sandbox (explicit else app default; precedence over
template), and POST /v1/sandboxes. Unknown preset -> 400. GET /v1/presets
lists them (console picker source of truth).
- runtimed applies the preset on FIRST boot only: seed the preset template into
an empty workspace, write sandbox.yaml only when missing — never overwrites
existing app files or sandbox.yaml. Falls back to default template if unknown.
- minimal starter templates added (node-express-standard, fastapi-standard,
nextjs-standard) — reliable boot, not fancy. React uses existing react-standard.
- console: app-type preset dropdown on New App (data-driven from /v1/presets);
sandbox create inherits the app's preset.
- tests: every preset manifest validates with the loader; unknown->400 (app +
sandbox create); app stores runtime_preset; resolve precedence; presets list;
applyPreset/writeManifestIfMissing (write-once, never overwrite); seed skips
non-empty workspace. gofmt/vet/build/test + contract test + console tsc green.
Image capability check: base image already has python3-venv (pip via venv works,
confirmed) + node/pnpm — so all 5 presets boot WITHOUT a Dockerfile change.
Live boot of all 5 on a rebuilt image is the deferred e2e step.
No Postgres/Redis/managed services, no compose, no manifest editor, no advanced
override, no provider work (7C-1 non-goals).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records the live e2e on rebuilt image sandboxd-base:p7c1 — all five presets boot: react-vite ~31s, nextjs ~39s (warm; cold may be slower), node-express ~30s, fastapi ~37s (runtime venv/pip install works), worker ~28s (preview none + worker running). Confirms: presets seed expected files, runtimed writes sandbox.yaml, process status/logs endpoint works, API rejects unknown presets with 400, runtimed logs loudly + falls back to react-standard on a bad preset env. Notes still-unit-only items (console dropdown not browser-clicked; app-default preset resolution not live-tested) and future image optimizations (warm pnpm/npm cache, preinstalled FastAPI/uvicorn or uv, Next.js layer). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tance The Next.js preset ran 'pnpm dev' (web) + 'pnpm build' (post-task build check) in the same workspace; next build writes production .next/ that the long-running next dev then serves from -> 500s on _next/static. Dev isn't restarted after the build, so it stays broken. Fix (smallest reliable): - nextjs preset build.command is now empty: the build check is the only thing that runs 'next build', so skipping it removes the poison source. Tradeoff: no post-task build verification for Next.js until an isolated build check exists. - web command 'rm -rf .next' before 'pnpm dev' defends a clean start against a stale/production .next carried in by snapshot restore (alone insufficient: dev isn't restarted post-build, hence skipping the check too). - nextjs template ships .gitignore (node_modules,.next,out,.env,.env.local) so the git-based workspace checkpoint doesn't treat them as app changes. Re-tested live (image sandboxd-base:p7c1b, portless): fresh ready ~30s, /+asset 200; reproduced bug (pnpm build -> 500); recovery via restart ~10s -> 200; edit hot-reloads; checkpoint tracks only 6 real files. Real LLM agent task NOT run (no ANTHROPIC_API_KEY) — verified the post-task build-check mechanism directly. Tests: nextjs manifest has no 'pnpm build' + empty build + 'rm -rf .next'; template ships .gitignore; all preset manifests still validate. Docs: 7C-1 NOT marked fully accepted; records the fix, the not-run agent path, and follow-ups (snapshot .next bloat unfiltered; split task build_ok/preview_ok/ app_healthy; empty agent.log on timeout). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ist as deferred Issue 2 (build/health semantics) — implemented: - TaskResult adds build_status (passed|failed|skipped), preview_ok (*bool; omitted for worker-only), app_healthy (build-not-failed AND web preview serving / worker-only a worker running). build_ok kept for back-compat but now true ONLY when build_status=passed — a skipped build is never build_ok=true. - runtimed sets build_status from the build check (empty command => skipped, not a fake pass) and derives app_healthy/preview_ok via postTaskHealth(). - console shows 'build skipped|passed|failed' (+ 'unhealthy') instead of the old unconditional 'build ok'. OpenAPI TaskResult updated. - tests: postTaskHealth web (passed/skipped/down/failed) + worker-only (running/ stopped); preview_ok omitempty in JSON. Issue 1 (snapshot ignore-list) — deferred (not small), documented: - capture is zstd of the raw loopback .img (block image), not a tree copy, so an ignore-list can't slot into capture without reflink-copy+loop-mount+prune or a filtered-tar redesign (mount risk for an RC). Correctness (stale .next in forks/restores) is already handled by 'rm -rf .next' on dev start; only size bloat remains. Recorded as a scoped follow-up. gofmt/vet/build/test + contract test green; console tsc + build green. No merge to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…copy Storage model (clarified): the OSS build stores each workspace as a plain bind-mounted DIRECTORY (loopback .img is legacy; internal/snapshot's zstd-.img path is dead in dir mode). The public /v1 snapshot/fork/restore subsystem captures by copying that directory (captureImage), so the ignore-list belongs in that copy — no image/zstd step involved. captureImage now uses copyTreeExcluding instead of 'cp -a': - skips node_modules/.next/out/.venv/__pycache__/.cache by base name at any depth (conservative; dist/build NOT ignored — templates don't treat them as generated); - copies symlinks verbatim and never follows them (no path-traversal/symlink escape during staging copy); - preserves mode + ownership (lchown) so the restore cp -a keeps files writable. Restore/fork unchanged (still cp -a the snapshot dir); restored workspaces re-create deps on first boot. Tests: exclusion incl. nested node_modules, source + sandbox.yaml preserved, symlink-escape safety; existing snapshot/fork/restore guard tests still pass. gofmt/vet/build/test + contract green. No merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se-blocker)
Bug: applyDefaults replaced an explicit empty build.command with the default
pnpm build, so presets could NOT disable the post-task build check. Next.js
therefore still ran 'next build' after tasks and poisoned the live 'next dev'
(.next/ -> 500 on _next/static); FastAPI/worker could get false/irrelevant
checks.
Fix: BuildSpec.Command is now *string so unset (nil) is distinct from explicit
empty (&""):
- no build block / build: {} -> default pnpm build (backward compatible)
- build.command: "" -> SKIP the build check
- build.command: "x" -> run x
task.go derefs the command; empty => skipped (build_status=skipped, build_ok
false). worker preset now sets build.command "" explicitly (was falling back to
the default).
Build checks are runtime verification (does the app still build/start after a
task), NOT production deployment builds — documented, with how to skip.
Acceptance tests (TestManifestBuildResolution): absent->default, {}->default,
""->skip, "x"->run; manifest assertions updated for *string.
Live retest (image sandboxd-base:p7c1d, portless): ran a coding task on a Next.js
sandbox — agent fails w/o creds but the post-task build check runs; result
build_status=skipped, NO pnpm/next build executed, / + four _next/static/chunks/*
return 200 (not poisoned). FastAPI: build_status=skipped, no pnpm build, /health
200. Cleaned up; prod untouched.
gofmt/vet/build/test + contract green. No merge to console/main.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gent builds New finding: even with the platform build check skipped, the coding AGENT can run 'next build' during a task, writing production .next/ that the live 'next dev' then serves -> 404/500 on _next/static. rm -rf .next at boot can't undo a mid-session poison because dev isn't restarted after the task. Fix: add web.restart_after_task (manifest WebProc field, default false). When true, runtimed restarts the web process after every task (skipped on cancel) via the existing supervisor (stop() -> re-run start command), then waits up to 90s for a 200 on the health path before reporting preview/health. The web command re-runs, so its 'rm -rf .next' wipes the agent's production build and dev comes back clean. Next.js preset sets restart_after_task: true; other stacks leave it false. Live retest (image sandboxd-base:p7c1e, portless): created Next.js sandbox, simulated the agent by running 'pnpm build' mid-session (.next became a production build w/ BUILD_ID; /_next/static/chunks/* -> 404), then ran a coding task. After the task: web restarts=1, / + four chunk assets -> 200, .next is dev again (no BUILD_ID). Recovery confirmed. Cleaned up; prod untouched. Tests: manifest parses web.restart_after_task (default false); nextjs preset sets it true. gofmt/vet/build/test green. Docs: 'Dev-mode resilience' section. No merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ProvisionFromTemplate (the single funnel for app fork, app restore, and direct from_snapshot creation) cloned the snapshot/template with 'cp -a' (preserves SOURCE ownership) and created the workspace $HOME dir as root — so the sandbox user (uid 1000) hit EACCES writing ~/.cache, pnpm/npm store, node_modules, .next, .venv, generated files. Fresh seed already chowns; the template path did not. Fix: after the clone, ProvisionFromTemplate recursively normalizes ownership of the whole workspace (incl. the $HOME dir itself) to the sandbox uid/gid (1000) — same result as the fresh seed's chown -R, never trusting the snapshot's captured ownership. Uses Lchown via a WalkDir lstat walk that never descends a symlink, so a symlink pointing outside the workspace can't redirect the chown. Logged via the loopback Manager logger. Host-side numeric chown is correct for the OSS --userns=host default. Tests (internal/loopback): chowns the whole tree incl. root dir; symlink-escape not followed; ProvisionFromTemplate normalizes (covers fork/restore/from_snapshot — all route through it). Live retest (image p7c1f, all portless -> no Traefik label -> no prod collision): - Next.js snapshot -> fork: $HOME 1000:1000, ready 31s, / + chunks 200, node_modules reinstalled, ~/.cache writable (no EACCES). - FastAPI fork: $HOME 1000:1000, ready 15s, /health 200, venv reinstalled. Cleaned up; prod untouched. gofmt/vet/build/test green. No merge to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two FastAPI bugs: 1. Port mismatch — the public preview routes to 3000 but uvicorn ran on 8000, so external preview was 502 while the internal probe said ready. 2. Stale after task — uvicorn didn't reload, so an agent-added route 404'd until a manual restart. Fix: FastAPI preset manifest now runs '.venv/bin/uvicorn main:app --host 0.0.0.0 --port 3000 --reload' with web.port: 3000 (health_path /health, build still skipped). Template requirements.txt adds watchfiles so --reload works. No restart_after_task — reload is reliable (kept for Next.js only). Tests: fastapi preset validates, web.port 3000, command has --reload + --port 3000, no 8000, no pnpm build; existing preset tests pass. Live retest (image p7c1g, portless; external check via container-IP:3000 — the path Traefik connects on — to stay off the host Traefik): create -> public /health 200 (was 502); agent adds /hello -> /hello 200 with NO manual restart (watchfiles reload, 2 reload lines in web.log); stop -> snapshot -> fork -> fork $HOME 1000:1000, public /health 200, /hello preserved, venv reinstalled. Cleaned up; prod untouched. gofmt/vet/build/test green. No merge to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two preset reload gaps (neither runtime has live reload): - node-express: 'node server.js' didn't reload -> agent-added routes 404 until manual restart. Set web.restart_after_task: true. - worker: a long-running worker kept old behavior after a task. Extended the restart_after_task mechanism from web to WORKER processes (per-process flag on the process struct; workers are bounced without a readiness wait, web keeps its health-wait). worker preset now ships an editable worker.sh (template worker-standard, command 'bash worker.sh') with the worker's restart_after_task: true, so edits to worker.sh take effect on restart. Not a generic process-policy framework — just the per-process flag. Manifest: Worker.RestartAfterTask added; process.restartAfterTask field; main wires web + each worker from the manifest; runTask restarts flagged web (with health wait) and flagged workers (stop -> supervisor re-runs). Tests: manifest parses worker.restart_after_task (default false); node-express + worker presets set restart_after_task; react-vite/fastapi do NOT (live reload), nextjs unchanged; all preset manifests validate. full go test + console build/typecheck green. Live retest (image p7c1h, portless): node-express agent adds /ping -> pong 200 after task, no manual restart (restarts 1); worker output heartbeat->PONG-BEAT in worker.log after task, no manual restart (restarts 1). React/FastAPI/Next unchanged. Cleaned up; prod untouched. Docs: reload-mechanism table + documented (not implemented) follow-ups: v1 vs legacy DELETE workspace semantics, keepalive_until missing from v1 GET, warming interstitial returns 200. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Marks 7C-1 as accepted & live-verified. All five presets pass preview + agent- task reload + fork/restore: - react-vite: task applies, preview 200, fork healthy - nextjs: build-provoking task no longer poisons preview (restart_after_task), chunks 200, fork healthy - fastapi: port 3000 + uvicorn --reload, task changes live, fork healthy - node-express: restart_after_task, added route live after task, fork healthy - worker: restart_after_task, code/output change reflected in logs, fork healthy Cross-cutting verified: snapshot ignore-list, fork/restore ownership normalized to sandbox:sandbox, wake/idle/keepalive edge cases. Remaining items recorded as non-blocking follow-ups: v1 DELETE purges vs legacy DELETE keeps workspace; keepalive_until missing from v1 GET; warming interstitial returns 200; per-task agent.log persistence on timeout. Phase-status banner + section header updated. Not merged to console/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Integration of all accepted v0.3 + v0.4 work (console/app UI, config & secrets, snapshots/fork/restore, observability/events, runtime manifest, process API/logs, five runtime presets, shared-host installer/preview port, OpenAPI updates). - CHANGELOG: v0.4.0 entry (Added/Fixed/Known limitations). - README: runtime presets + sandbox.yaml section; Known limitations (v0.4.0). - ARCHITECTURE: runtimed manifest/process model + presets note. - console/README: app detail (preview/endpoint, config, snapshots, activity, processes) + New-App preset picker. Known limitations recorded (non-blocking): v1 DELETE purge vs legacy keep; keepalive_until not surfaced in v1 GET; warming interstitial returns 200; per-task agent.log persistence on timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sandboxd (the contract — primary):
- TestPublicAPISurfacePresent pins the required public /v1 surface in BOTH
api.go routes and openapi.yaml (parity alone can't catch both-sides deletion):
presets, apps, app sandbox, sandbox GET, process logs, snapshots + app
snapshot/fork/restore, app config, app events.
- v1_client_contract_test: pins the JSON SHAPES the console/SDK consume
(GET /v1/sandboxes/{id}: id/status/preview{url,status}/processes/template;
GET /v1/presets: id/label/description). If the server shape changes, this
fails first and the client fixtures must follow.
(Existing suites already cover config redaction, events, process logs,
snapshots/fork guards, runtime-view, build/health semantics.)
console (a client — lightweight smoke, fixtures mirror sandbox d responses):
- vitest + jsdom + @testing-library/react (lockfile regenerated for frozen install).
- src/test/fixtures.ts: apps, 5 presets, web + worker sandbox runtime/processes,
events, redacted config — shaped to match the Go contract tests.
- App.test: console loads, app list renders, preset dropdown populated from
/v1/presets. AppDetail.test: preview/processes/activity/config sections render;
config redaction (no plaintext secret); worker/no-preview renders as valid;
Delete control wording surfaced.
docs/release-checklist.md: manual VPS smoke (install, React/Next/FastAPI/express/
worker tasks, snapshot/fork, process logs, events, config redaction, idle/wake/
keepalive) + known limitations.
Gate: go test + OpenAPI contract + console tsc/build + console vitest all green.
No new product features; not merged to main.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reset test, unhealthy fixture) 1. CI: the console job now runs pnpm install --frozen-lockfile, tsc --noEmit, pnpm test (vitest), and pnpm build directly (clear output), keeping the docker image build as an extra check. Adds release/** to push triggers (PRs already run via pull_request). 2. Console delete wording: v1 DELETE purges the workspace, so the button now reads 'Delete sandbox and workspace' and requires a window.confirm that spells out the workspace (code/deps/generated files) is permanently removed. Test updated to assert the wording. 3. Preset shape test: assert the REQUIRED preset ids exist with id/label/ description instead of an exact count, so adding a preset later doesn't break it. 4. Unhealthy sandbox fixture + test: stopped sandbox, preview down, web process not running (restarts=3). Asserts the UI shows no live preview iframe, renders 'Sandbox not running', and the process row reads 'stopped' (never 'running'). Gate: gofmt/vet/build/go test + OpenAPI contract + console tsc + vitest (7/7) + console build, all green. No new features; not merged to main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sandboxd (the contract): - GET /v1/settings: read-only, tokenless-safe instance summary — version/build, networking (preview domain/base/port/tls/entrypoint), auth.enabled, runtime (storage mode, base image), lifecycle (idle reap + threshold, keepalive max), egress mode, agent providers, runtime presets, and capability flags (snapshots/config_secrets/templates/forward_auth). Server.Instance holds the static safe metadata; populated in main (version, auth !Disabled, storage 'directory', egress mode, providers ['opencode'], idle settings). NEVER emits secrets/tokens/keys/env values. - OpenAPI documents /v1/settings; added to required-public-surface test. - Tests: TestV1SettingsShapeAndNoSecretLeak builds a real cipher from a known key and asserts the response shape is stable AND the key never appears; capability flags + preview_base + presets verified. Contract parity green. console (a client): - api.ts Settings type + getSettings(); Settings.tsx read-only page with sections System / Networking / Runtime & presets / Agents / Security-auth / Egress / Capabilities; topbar Settings nav. - fixtures.settingsFixture mirrors the API shape; Settings.test.tsx renders all sections and asserts auth shows a mode (not a token) and no secret/password field appears. Non-goals respected: read-only (no config editing), no RBAC, no tenant mgmt, no external secret managers, no daemon restart. Gate green: gofmt/vet/build/go test + OpenAPI contract + console tsc/vitest(9/9)/build. Not merged to main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sted, hot-applied) Only the lifecycle tunables (idle_reap_enabled, idle_threshold_seconds, keepalive_max_seconds) are editable from the console; everything else stays read-only / env-managed (auth, secrets, egress, networking are never editable). sandboxd (contract): - PATCH /v1/settings: STRICT allowlist via json DisallowUnknownFields — any other key (auth/egress/networking/secrets/version/unknown) -> 400, no mutation. Range-validated (idle 60..86400s, keepalive 0..7d). Persists to a singleton instance_settings row (migration 0018) and HOT-APPLIES via a shared instancecfg.Live the idle reaper + keepalive path read each use; audited (settings.update). 503 when Live/Store unwired. - GET /v1/settings now reads lifecycle live + advertises an "editable" list. - reaper.Idle gains ThresholdFn/EnabledFn (live overrides; nil = static). main builds instancecfg.Live from env defaults, overlays the persisted row at boot, wires it to the reaper + Server. keepalive handler reads s.keepaliveMax() live. - Tests: reject protected/unknown keys (no mutation/persist), validation bounds, persist+hot-apply round-trip, 503 without Live, no-secret-leak (existing). OpenAPI + contract + required-surface updated. console (client): - Settings page: editable Lifecycle section (gated on the server "editable" list) with Save -> PATCH; all other sections read-only (no inputs). api.patchSettings. - Tests: lifecycle inputs present + read-only sections have no inputs; Save sends a PATCH carrying only the lifecycle object. vitest 11/11. Non-goals respected: no RBAC, no tenant mgmt, no external secret managers, no daemon restart, no editing of protected/restart-required settings. Gate green: gofmt/vet/build/go test + OpenAPI contract + console tsc/vitest/build. Not merged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
22d2256 to
37f06c2
Compare
Defines the base image / runtime preset / starter-import layering and the contract a custom base image must meet to work with sandboxd: - runtimed as the container main process - sandbox user uid/gid 1000:1000 - workspace /home/sandbox, app /home/sandbox/workspace/app - optional /opt/sandbox-skel seed + /opt/templates/<preset> - unprivileged: no privileged, no Docker socket, no extra caps, no writable-rootfs assumption - must provide the toolchains the presets in use require - selected instance-wide via SANDBOXD_IMAGE (read-only in Settings) Includes footguns (missing runtimed / wrong uid / wrong workspace / missing toolchain / requires Docker-privileged) and a roadmap (startup preflight, image profiles, browser image, app-level selection — later; no Dockerfile builder / Compose / registry creds). No code, no Settings UI change, no image profiles, no app-level selection. On release/v0.4-apps-console (PR #35 stays draft); release docs untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…notes - README: reposition as "Self-hosted control plane for AI-built apps"; add an above-the-fold "What you get" list (web console, runtime presets, live preview URLs, agent tasks, app config/secrets, snapshots/fork/restore, activity/events, process logs, settings/lifecycle); extend Known limitations (Docker-only; no Compose/local DB/Git import yet). - CHANGELOG: cut [0.4.0] — 2026-06-25 (was Unreleased); add Settings (GET/PATCH lifecycle), base-image contract, and test-foundation entries; honest known limitations (v1 delete purge, keepalive_until not surfaced, warming page 200, Docker-only, no Compose/local DB/Git import); note it rolls up the v0.3.0 work. - docs/release-notes-v0.4.0.md: draft GitHub release notes. - installer: default base tag sandboxd-base:0.4.0-test -> :0.4.0 (drop confusing "test" language at launch; SANDBOXD_IMAGE override unchanged). No feature/runtime changes. Gate green: gofmt/vet/build/go test + OpenAPI contract + installer syntax + console tsc/test(11)/build. Validated by the from-zero VPS RC QA pass. PR #35 to be marked ready (not merged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
Author
|
RC QA: PASS — a from-zero VPS install + the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft — do not merge until approved. Single release-candidate integrating all accepted v0.3 + v0.4 work.
Branch ancestry is linear:
main → console → feat/v0.4-snapshots-observability → feat/runtime-manifest. This branch == the tip of that stack (no conflicts; merges were fast-forwards). Source branches are preserved.Included
sandbox.yaml, web + workers process model)Test results (all green)
gofmt -lclean ·go vet·go build ./...·go test ./...(all packages)tsc --noEmit+ production buildbash -nsyntax check (scripts/dev/install-v04-ubuntu.sh,scripts/e2e.sh)_next/staticchunks 200 and a build-provoking task no longer poisons (healed to 200); FastAPI add-endpoint live via--reload; snapshot → fork → fork preview 200 + endpoint preserved ($HOMEnormalized tosandbox:sandbox); process logs endpoint; Activity events recorded; Config & Secrets redaction (sensitive value never returned). Real-LLM agent tasks were simulated (no API key in the test env) but the post-task pipeline ran; console verified via build/typecheck + its/v1endpoints (not browser-clicked).Known limitations (non-blocking, documented)
DELETE /v1/sandboxes/{id}purges workspace vs legacyDELETE /sandbox/{id}keeps itkeepalive_untilnot surfaced inGET /v1/sandboxes/{id}agent.logcan be empty on timeout (persistence WIP)See
CHANGELOG.md(v0.4.0) anddocs/sandbox-manifest.md.🤖 Generated with Claude Code