feat: managed-mode management API (status, sync-now) and fail-closed startup#190
Open
Zygimantass wants to merge 1 commit into
Open
feat: managed-mode management API (status, sync-now) and fail-closed startup#190Zygimantass wants to merge 1 commit into
Zygimantass wants to merge 1 commit into
Conversation
Managed proxies previously ran with no management server at all, and a proxy served requests with whatever pipeline it had - including the empty pre-first-sync pipeline - passing placeholder credentials upstream un-transformed. - allow management.listen in managed mode: it now serves GET /v1/status (the applied control-plane state: config hash, principal, last sync) and POST /v1/sync (request an immediate out-of-band sync). /v1/reload stays standalone-only since there is no file to re-read. - IRON_MANAGEMENT_LISTEN env override, matching the other IRON_* vars. - the sync poller records a status snapshot (config_hash, principal_id, principal_status, synced_once, last_sync_at) and accepts a non-blocking Poke() that wakes the poll loop early. - fail closed during startup: proxy.Options.Ready gates request handling with 503 until the first control-plane config has been applied, so a freshly started or restarted managed proxy can never leak placeholder credentials to upstream APIs. The motivating incident: a sandbox control plane reassigned a proxy's principal and routed traffic immediately; the first LLM call beat the proxy's next 5s sync poll by ~350ms and went upstream with a placeholder key (401). With this change the control plane can POST /v1/sync and poll GET /v1/status until principal_id matches before routing traffic. Amp-Thread-ID: https://ampcode.com/threads/T-019eb82e-5bc9-707f-9d5c-cdb6c9d16926 Co-authored-by: Amp <amp@ampcode.com>
Zygimantass
added a commit
to paradigmxyz/centaur
that referenced
this pull request
Jun 12, 2026
…rier (#526) Claiming a warm sandbox reassigns its iron-proxy principal in iron-control, but managed proxies only pick the change up on their next /proxy/sync poll (5s cadence). The claim path papered over this with a blind 6s sleep; the Rust harness fires its first LLM call within milliseconds of stdin, and any gap sends the placeholder credential upstream (observed as Anthropic 401s in prod when the first call beat the poll by ~350ms, exe_42484b5303b9401db9d5af4a9112fd91). Replace the sleep with a barrier against the proxy's managed-mode management API (ironsh/iron-proxy#190): POST /v1/sync pokes an immediate out-of-band sync, then GET /v1/status is polled until the proxy reports the claimed principal's config applied (typically well under a second instead of 6s). Wiring: - proxy pods get IRON_MANAGEMENT_LISTEN=:9092 and a per-pod random IRON_MANAGEMENT_API_KEY; the barrier reads both back off the live pod so it survives api-rs restarts and env overrides - the proxy NetworkPolicy gains an api-rs -> :9092 ingress rule (sandboxes still cannot reach the management port) - proxy images without the management API never answer on the port; after a 2s probe window the barrier falls back to the previous fixed delay, so behavior is unchanged until the image is bumped - the barrier never fails the claim: post-#190 proxies fail closed (503) until their first sync, so a timeout degrades to a brief retryable window instead of a failed execution Amp-Thread-ID: https://ampcode.com/threads/T-019eb82e-5bc9-707f-9d5c-cdb6c9d16926 Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two related gaps for managed (control-plane-synced) proxies:
Changes
management.listenis now allowed in managed mode, serving:GET /v1/status→{config_hash, principal_id, principal_status, synced_once, last_sync_at}— the applied control-plane state, so an operator/control plane can verify which principal's config the proxy is enforcing before routing traffic through itPOST /v1/sync→ non-blocking immediate sync request (pollerPoke()); callers poll/v1/statusto observe the result/v1/reloadstays standalone-only (404 in managed mode — no file to re-read)IRON_MANAGEMENT_LISTENenv override, consistent with the otherIRON_*vars (managed proxies have no config file)SyncResponsenow parses the control plane'sstatus/principal_idfields; the poller records them in a snapshot (retained across hash-match responses, seeded from the startup sync)proxy.Options.Readygates request handling with 503 until the first control-plane config is applied (wired in managed mode; standalone unaffected)Tests
/v1/statusand/v1/sync;/v1/reload404s in managed modego test ./...— 28/28 packages pass.