Skip to content

Retry rate-limited and transient runs-index fetches#558

Open
robinaugh wants to merge 2 commits into
mainfrom
jason/rwx-1133-cli-rwx-runs-list-automatic-retries-on-rate-limit-transient
Open

Retry rate-limited and transient runs-index fetches#558
robinaugh wants to merge 2 commits into
mainfrom
jason/rwx-1133-cli-rwx-runs-list-automatic-retries-on-rate-limit-transient

Conversation

@robinaugh

@robinaugh robinaugh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Background

Problem

The runs index is rate limited, so paging a large result set with rwx runs list died mid-run on a 429 (or a transient 5xx / empty body), forcing callers to hand-roll their own backoff.

Solution

  • Retry in api.ListRuns, so every paged list and the internal ListSandboxRuns get it consistently.
  • Retries 429 (honoring Retry-After), transient 5xx, transient transport errors, and empty/non-JSON 200s; non-retryable 4xx surface immediately. Bounded backoff reuses the existing retry.Backoff.
  • Retries are visible (rate limited by RWX, retrying in 60s (attempt 2/5)), routed to stderr under --json so stdout stays parseable. Sandbox run lookups (ListSandboxRuns) surface the same progress.
  • Client-facing messages intentionally do not restate the server's limit (e.g. "100 requests/min"): the limit is server-owned and being retuned in the paired PR, so the CLI relies on the Retry-After header rather than echoing a number that would drift.

Further confirmation needed

  • End-to-end behavior against a live 429 (unit tests use a stubbed transport).

@robinaugh robinaugh self-assigned this Jun 25, 2026
@robinaugh robinaugh marked this pull request as ready for review June 25, 2026 18:50
The runs index is rate limited (100 req/min), so paging a large result
set with `rwx runs list` would die mid-run on a 429 (or a transient 5xx /
empty body), forcing callers to hand-roll their own backoff loop.

ListRuns now retries 429s (honoring Retry-After), transient 5xx, transient
transport errors, and empty/non-JSON 200 bodies, with bounded exponential
backoff reusing the existing retry.Backoff. Retries are announced via a
RetryProgress writer so a multi-page list does not look hung; the command
routes it to stderr under --json so structured stdout stays clean.
ListSandboxRuns also goes through ListRuns, so the new retry/backoff
applied to sandbox commands too — but silently, making a transient blip
look like a multi-second hang on the hot `sandbox exec` discovery path.

Thread a retry-progress writer through ListSandboxRuns and pass s.Stderr
from the sandbox call sites so the wait is announced. The mock's method
takes the writer (matching the interface) but ignores it, keeping the
existing zero-arg MockListSandboxRuns test literals unchanged.
@robinaugh robinaugh force-pushed the jason/rwx-1133-cli-rwx-runs-list-automatic-retries-on-rate-limit-transient branch from 11bc0b5 to da01b7d Compare June 26, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant