Skip to content

feat: KSM-881 throttle retry with exponential backoff#68

Open
mgallego-keeper wants to merge 2 commits into
masterfrom
KSM-881-golang-throttle-retry-backoff
Open

feat: KSM-881 throttle retry with exponential backoff#68
mgallego-keeper wants to merge 2 commits into
masterfrom
KSM-881-golang-throttle-retry-backoff

Conversation

@mgallego-keeper

@mgallego-keeper mgallego-keeper commented Jun 5, 2026

Copy link
Copy Markdown

Summary

Implements throttle retry with exponential backoff for the Go SDK (epic KSM-876, task KSM-881). The Keeper backend throttles with HTTP 403 {"error":"throttled"}; previously the SDK returned a *KeeperHTTPError on the first throttle. Now PostQuery retries transparently, so every SecretsManager operation (GetSecrets, CreateSecret, UpdateSecret, …) gains throttle resilience with no caller changes.

What changed

  • core/core.go — in the PostQuery loop, detect throttle responses before HandleHttpError and retry with exponential backoff + jitter; return ErrThrottled (wrapped) once retries are exhausted. Adds parseThrottle (detection / retry_after) and throttleDelay (backoff + jitter), plus a throttleJitter seam for deterministic tests.
  • core/errors.go — new exported sentinel ErrThrottled (check with errors.Is(err, ksm.ErrThrottled)).
  • core/keeper_globals.gomaxThrottleRetries = 5, baseThrottleDelaySec = 11 (1s margin over the backend's 10s memcached TTL). No new public config.
  • core/payload.go — optional Context.Sleep func(time.Duration) field (defaults to time.Sleep); a no-real-sleep test seam mirroring the existing Context.Transport, preserved across PrepareContext rebuilds.
  • CHANGELOG.md — entry under ## Unreleased. No version bump.

Algorithm

Delays 11s, 22s, 44s, 88s, 176s (each ±25% jitter) → ~341s worst case over 5 retries. A retry_after in the response body takes precedence when present. HandleHttpError is untouched, so key-rotation retry is unaffected.

Tests

  • core/throttle_test.go (unit): throttleDelay exponential sequence, retry_after precedence, jitter bounds (pinned via the throttleJitter seam); parseThrottle table (throttled / result_code / other error / non-JSON / empty / non-numeric / negative retry_after).
  • test/throttle_test.go (end-to-end via GetSecrets, sleeps recorded through Context.Sleep): retry-then-success, multi-throttle delay bounds, exhaustion → errors.Is(ErrThrottled), retry_after honored, throttle + key-rotation compose, non-throttle 403 / non-JSON 502 not retried.
  • TestHTTPErrorErrorsAs switched to a non-throttle error code (throttling is now retried).
$ cd core && go test ./... && go vet ./...   # ok
$ cd test && go test ./...                    # ok

Jira: KSM-881 · mirrors the Python implementation (Keeper-Security/secrets-manager#1031).

Detect HTTP 403 {"error":"throttled"} inside PostQuery and retry up to
maxThrottleRetries (5) with exponential backoff (baseThrottleDelaySec = 11s,
doubling: 11/22/44/88/176s) plus +/-25% jitter, honoring retry_after from the
response body when present. Returns the new sentinel ErrThrottled (checkable
via errors.Is) once retries are exhausted.

The throttle check sits before HandleHttpError in the PostQuery loop, so the
existing key-rotation retry path is unaffected. No new public config; the sleep
is injectable in tests via a new optional Context.Sleep field (mirrors the
existing Context.Transport seam).

No version bump; CHANGELOG entry added under Unreleased. Adds core/throttle_test.go
(unit) and test/throttle_test.go (end-to-end); TestHTTPErrorErrorsAs switched to a
non-throttle code since "throttled" is now retried. go test ./... green in both modules.
idimov-keeper
idimov-keeper previously approved these changes Jun 8, 2026
Mirrors the Python SDK fix (PR #1033, per @stas-schaller review): parseThrottle
only inspected the response body, so a non-403 response (e.g. 500/502) carrying
{"error":"throttled"} would be retried 5x before failing. Gate the throttle
parse/retry on ksmRs.StatusCode == 403 so non-403 responses fall straight through
to HandleHttpError. Adds a regression test (TestThrottleBodyNon403NotRetried).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants