feat: managed-mode management API (status, sync-now) and fail-closed startup by Zygimantass · Pull Request #190 · ironsh/iron-proxy

Zygimantass · 2026-06-12T12:09:33Z

Problem

Two related gaps for managed (control-plane-synced) proxies:

No way to observe or accelerate config application. A control plane that reassigns a proxy's principal (e.g. binding a warm sandbox to a session at claim time) has no way to know when the proxy has actually applied the new config — the only option is sleeping longer than the 5s poll interval and hoping. We hit exactly this in production: the first LLM call after a principal reassignment beat the proxy's next sync poll by ~350ms and went upstream with the placeholder credential still in the header (401 from the upstream API).
Fail-open startup. A freshly started managed proxy serves requests with whatever pipeline it has — including the empty pre-first-sync pipeline — so requests during startup pass through un-transformed, leaking placeholder credentials upstream.

Changes

Managed-mode management server: management.listen is now allowed in managed mode, serving:
- GET /v1/status → {config_hash, principal_id, principal_status, synced_once, last_sync_at} — the applied control-plane state, so an operator/control plane can verify which principal's config the proxy is enforcing before routing traffic through it
- POST /v1/sync → non-blocking immediate sync request (poller Poke()); callers poll /v1/status to observe the result
- /v1/reload stays standalone-only (404 in managed mode — no file to re-read)
IRON_MANAGEMENT_LISTEN env override, consistent with the other IRON_* vars (managed proxies have no config file)
SyncResponse now parses the control plane's status / principal_id fields; the poller records them in a snapshot (retained across hash-match responses, seeded from the startup sync)
Fail-closed startup: new proxy.Options.Ready gates request handling with 503 until the first control-plane config is applied (wired in managed mode; standalone unaffected)

Tests

poller: poke triggers immediate sync; status snapshot updates and retains principal across hash-match responses; seed semantics
management: auth/method/availability matrix for /v1/status and /v1/sync; /v1/reload 404s in managed mode
proxy: requests 503 while not ready, flow once ready

go test ./... — 28/28 packages pass.

Managed proxies previously ran with no management server at all, and a proxy served requests with whatever pipeline it had - including the empty pre-first-sync pipeline - passing placeholder credentials upstream un-transformed. - allow management.listen in managed mode: it now serves GET /v1/status (the applied control-plane state: config hash, principal, last sync) and POST /v1/sync (request an immediate out-of-band sync). /v1/reload stays standalone-only since there is no file to re-read. - IRON_MANAGEMENT_LISTEN env override, matching the other IRON_* vars. - the sync poller records a status snapshot (config_hash, principal_id, principal_status, synced_once, last_sync_at) and accepts a non-blocking Poke() that wakes the poll loop early. - fail closed during startup: proxy.Options.Ready gates request handling with 503 until the first control-plane config has been applied, so a freshly started or restarted managed proxy can never leak placeholder credentials to upstream APIs. The motivating incident: a sandbox control plane reassigned a proxy's principal and routed traffic immediately; the first LLM call beat the proxy's next 5s sync poll by ~350ms and went upstream with a placeholder key (401). With this change the control plane can POST /v1/sync and poll GET /v1/status until principal_id matches before routing traffic. Amp-Thread-ID: https://ampcode.com/threads/T-019eb82e-5bc9-707f-9d5c-cdb6c9d16926 Co-authored-by: Amp <amp@ampcode.com>

…rier (#526) Claiming a warm sandbox reassigns its iron-proxy principal in iron-control, but managed proxies only pick the change up on their next /proxy/sync poll (5s cadence). The claim path papered over this with a blind 6s sleep; the Rust harness fires its first LLM call within milliseconds of stdin, and any gap sends the placeholder credential upstream (observed as Anthropic 401s in prod when the first call beat the poll by ~350ms, exe_42484b5303b9401db9d5af4a9112fd91). Replace the sleep with a barrier against the proxy's managed-mode management API (ironsh/iron-proxy#190): POST /v1/sync pokes an immediate out-of-band sync, then GET /v1/status is polled until the proxy reports the claimed principal's config applied (typically well under a second instead of 6s). Wiring: - proxy pods get IRON_MANAGEMENT_LISTEN=:9092 and a per-pod random IRON_MANAGEMENT_API_KEY; the barrier reads both back off the live pod so it survives api-rs restarts and env overrides - the proxy NetworkPolicy gains an api-rs -> :9092 ingress rule (sandboxes still cannot reach the management port) - proxy images without the management API never answer on the port; after a 2s probe window the barrier falls back to the previous fixed delay, so behavior is unchanged until the image is bumped - the barrier never fails the claim: post-#190 proxies fail closed (503) until their first sync, so a timeout degrades to a brief retryable window instead of a failed execution Amp-Thread-ID: https://ampcode.com/threads/T-019eb82e-5bc9-707f-9d5c-cdb6c9d16926 Co-authored-by: Amp <amp@ampcode.com>

Zygimantass mentioned this pull request Jun 12, 2026

fix(sandbox): replace warm-claim proxy sync delay with a real ack barrier paradigmxyz/centaur#526

Merged

mslipper mentioned this pull request Jun 16, 2026

fix: harden managed readiness status #195

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: managed-mode management API (status, sync-now) and fail-closed startup#190

feat: managed-mode management API (status, sync-now) and fail-closed startup#190
Zygimantass wants to merge 1 commit into
ironsh:mainfrom
Zygimantass:feat/managed-mode-status-sync-failclosed

Zygimantass commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Zygimantass commented Jun 12, 2026

Problem

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant