draft: v0.4.0 snapshots, fork/restore, and observability#34
Draft
tastyeffectco wants to merge 10 commits into
Draft
draft: v0.4.0 snapshots, fork/restore, and observability#34tastyeffectco wants to merge 10 commits into
tastyeffectco wants to merge 10 commits into
Conversation
v0.4.0 backend on the public /v1/snapshots subsystem only (internal
/sandbox/{id}/... stays unexposed):
- migration 0015 adds snapshot.source_app_id so per-app history survives the
ephemeral source sandbox; capture stamps it from the source's app_id.
- GET /v1/apps/{id}/snapshots — tenant+app-scoped history.
- POST /v1/apps/{id}/restore — REPLACE the app's current sandbox from a
snapshot (purge current, then clone). Destructive; console confirms.
- POST /v1/apps/{id}/fork — new app + its sandbox spun from a snapshot;
source app untouched.
Tenant scoping enforced on every path (cross-tenant app/snapshot -> 404).
Sandbox spin reuses the proven create path (template_path + .git reset).
Tests cover store scoping, history, and the restore/fork guard paths; the
Docker-dependent spin is verified on a real host, not CI. OpenAPI + contract
test updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se 4)
Adds a Snapshots panel to the app detail screen backed by the new
/v1/apps/{id} endpoints:
- history list (name, captured time, size) via GET /v1/apps/{id}/snapshots
- Restore: confirms (replaces the current sandbox, discards un-snapshotted
work) then POST /v1/apps/{id}/restore and refreshes
- Fork: prompts for a name, POST /v1/apps/{id}/fork into a new app
The capture button now refreshes history and (since v0.4.0 ships these) drops
the 'coming soon' wording. Actions disabled unless the snapshot is ready.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…T verified Records the release discipline + verification status for the Phase 4 preview branch: capture/history/tenant-scoping/backend-orchestration are tested, but the live restore/fork sandbox spin and preview are deliberately deferred to a real isolated v0.4.0 deploy (they'd otherwise expose port-3000 sandboxes to prod Traefik on the shared host). Not for merge to console/main; no non-draft PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/dev/install-v04-ubuntu.sh stands up the Phase 4 stack on a fresh Ubuntu 22.04/24.04 server, reusing the repo's docker-compose (traefik + sandboxd + console profile) — no parallel deploy system. It installs Docker if missing, fails if 80/443 are taken, detects the public IPv4, uses sslip.io for preview + console URLs (HTTP on :80), writes .env + a docker-compose.override.yml, gates the public console with Traefik basic auth (demo creds), keeps the API on loopback (disables the edge api.yml router), builds images, starts the stack, and prints URLs + teardown. docs/v0.4.0-test-runbook.md: requirements, install, the 14-step create→preview→ snapshot→restore→fork checklist, TLS-as-follow-up, teardown, release discipline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 80/443) Default to shared-host mode so the installer is safe next to Coolify/nginx/another Traefik: - HTTP_PORT=18080 (uncommon edge port; set HTTP_PORT=80 for dedicated-host mode) - API_PORT=19090 on loopback only - only the chosen HTTP_PORT must be free; fail clearly telling the user to set HTTP_PORT if taken; do NOT check or require 443 (TLS deferred) - generated URLs include :<HTTP_PORT> unless it's 80 Runbook documents both modes plus 'behind an existing proxy': keep 18080 and have the front proxy forward console.<ip>.sslip.io + *.preview.<ip>.sslip.io to 127.0.0.1:18080 (Host preserved); TLS via the front proxy or a real wildcard domain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On a shared host with HTTP_PORT=18080, the API returned bare preview URLs
(…sslip.io/) that hit whatever owns :80 (Coolify/front proxy) instead of
sandboxd's Traefik on :18080. previewURL() now appends the host port unless it's
the scheme default (80 for http, 443 for https):
Server.PublicHTTPPort <- SANDBOXD_PUBLIC_HTTP_PORT (main.go)
docker-compose.yml <- SANDBOXD_PUBLIC_HTTP_PORT: ${HTTP_PORT:-80}
The console iframe + open-in-tab link consume sb.preview.url unchanged, and
restore/fork responses use the same previewURL(), so all preview surfaces get the
corrected port. Unit-tested across http/https x default/custom ports; verified
live that GET /v1/sandboxes/{id} returns ...sslip.io:18080. No installer change
needed — it already writes HTTP_PORT, which compose forwards.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One append-only app_events table in the existing control-plane SQLite DB (no
ClickHouse/OTEL/Loki/separate DB), a centralized best-effort recorder, a
tenant-scoped paginated read API, and a console Activity timeline.
- migration 0016: app_events(id ULID, owner_token, app/sandbox/task/snapshot ids,
type, severity, message, payload_json, created_at) + scoped indexes. ULID id
doubles as the newest-first page cursor.
- internal/events: Recorder.Record (mirrors audit: own Store interface, detached
ctx, never breaks the request); stable type/severity constants.
- store: InsertAppEvent + ListAppEvents/ListTaskEvents (owner_token-scoped,
cursor-paginated); owner-agnostic GetApp for the background task path.
- API: GET /v1/apps/{id}/events and /v1/tasks/{id}/events (newest-first,
default 50 / max 200, ?before cursor, next_before). Cross-tenant -> 404/empty.
- instrumented via the recorder (no scattered SQL): app.created/updated,
config.created/updated/deleted (key only, never the secret), snapshot
captured/capture.failed/restored/forked, sandbox create.started/failed/
started/stopped/deleted, task.started, and on the task terminal point
task.completed/failed/build.failed + preview.health.ok/failed.
- console: read-only Activity panel on app detail (time/severity/type/message +
related ids), durable across refresh/restart.
- docs/openapi.yaml + contract test; .env.example notes the future
SANDBOXD_EVENT_RETENTION_DAYS knob (retention deferred).
Tests: recorder writes valid JSON events; tenant scoping; pagination/limit;
config event carries the key but never the secret; failed task emits
task.failed + task.build.failed on both feeds. gofmt/vet/build/test + contract
test green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ndexes
Follow-up to Phase 5 (3 fixes):
1. No raw output in app_events.payload_json. Task events now carry structured
flags/reasons only — never BuildErrorMessage/PreviewErrorMessage/ErrorMessage
text (which can echo secrets the app printed; the full text stays in the
task's result.json). New payloads:
task.completed -> {files_changed, duration_ms, build_ok}
task.failed -> {failure_reason, has_error}
task.build.failed -> {reason:'build_failed', has_build_error:true}
preview.health.failed-> {preview_status, has_preview_error}
Test now plants a fake secret in all three error fields and asserts it never
appears in any event's payload_json or message.
2. Monotonic ULID event ids (ulid.Monotonic under a mutex), so a same-millisecond
completion burst sorts in emission order by id (the page cursor). Added a
monotonic-ordering test.
3. Dropped the unused indexes (owner-only, sandbox-only, type-only) from
migration 0016 — no endpoint queries them and each is write amplification on
an append-only table. Kept idx_app_events_app + idx_app_events_task.
gofmt/vet/build/test + OpenAPI contract test green. Console untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's included
Snapshots / fork / restore (Phase 4)
GET /v1/apps/{id}/snapshots(tenant- and app-scoped); migration0015addssnapshot.source_app_idso history survives the ephemeral sandbox.POST /v1/apps/{id}/restore(replaces the app's current sandbox; destructive, console confirms).POST /v1/apps/{id}/fork(new app + its own sandbox; source untouched)./v1/snapshotssubsystem; the internal/sandbox/{id}/...snapshot API stays unexposed.Observability (Phase 5)
app_eventstable in the existing control-plane SQLite DB (migration0016). No ClickHouse/OTEL/Loki, no separate logs DB.internal/events, mirrorsaudit): never breaks a request; monotonic-ULID ids that double as the page cursor.GET /v1/apps/{id}/eventsandGET /v1/tasks/{id}/events(newest-first, default 50 / max 200,?beforecursor →next_before).app.created/updated,config.created/updated/deleted(key only — never the secret),snapshot.captured/capture.failed/restored/forked,sandbox.create.started/failed,sandbox.started/stopped/deleted,task.started/completed/failed/build.failed,preview.health.ok/failed. Payloads carry structured flags/reasons only — no raw build/dev-server/agent output.Shared-host preview port fixes
SANDBOXD_PUBLIC_HTTP_PORT) unless it's the scheme default, so a shared-host deploy (e.g.:18080) returns a reachable URL.scripts/dev/install-v04-ubuntu.sh,docs/v0.4.0-test-runbook.md): shared-host-safe defaults (uncommon ports, sslip.io, console basic-auth, API on loopback).Test status
Unit / integration (CI-equivalent, green locally):
gofmt,go vet,go build ./...,go test ./..., the OpenAPI contract test (every/v1route documented), and the consoletsc --noEmit+ image build. Notable coverage: snapshot store scoping + restore/fork tenant guards; event recorder writes valid JSON; events tenant scoping; pagination/limit + cursor; a planted fake secret in build/preview/agent error text never reachespayload_json; monotonic event-id ordering; preview-URL port logic (http/https × default/custom).Local shared-host (real Docker) — verified: create sandbox; app-list badge reflects real status; start/stop; delete scoped to the instance's own container (prod's untouched); snapshot capture stamps
source_app_id; per-app history returns it; running-source rejected409; APIpreview_urlincludes:18080. All done with portless test sandboxes so prod Traefik never discovered them.Not verified (deferred):
sandboxd.managedlabel prod Traefik would discover on a shared host. Deferred to a real isolated v0.4.0 deploy (or an isolated Traefik). Backend orchestration + tenant scoping are unit-tested; the live sandbox spin/preview is not.sandbox.stoppedand browser-wakesandbox.startedare not evented yet (fire outside a request).Explicitly deferred / out of scope
access_policyis metadata only;agent_access/runtime_access/bothare reserved in the UI).SANDBOXD_EVENT_RETENTION_DAYS).🤖 Generated with Claude Code