fix: resolve stale connection resets and unretried DEADLINE_EXCEEDED errors#63
fix: resolve stale connection resets and unretried DEADLINE_EXCEEDED errors#63ravibits wants to merge 4 commits into
Conversation
- Regenerated gRPC stubs from proto v0.1.121.2 - Implemented: UpdateOrganizationSessionPolicy, GetOrganizationSessionPolicy, GetApplicationSessionPolicy, SearchOrganization, GetOrganizationUserManagementSetting, AssignUserRoles, RemoveUserRole - Breaking changes: none (removed SessionSettings RPCs were never wrapped) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- searchOrganizations, updateOrganizationSessionPolicy - getOrganizationSessionPolicy, getApplicationSessionPolicy, getOrganizationUserManagementSetting - assignUserRoles, removeUserRole Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…errors Two bugs surfaced from Rocketlane production logs over 3 days: 1. ScalekitAuthClient: HttpClient connection pool was reusing stale TCP connections that the server had already closed. This caused 'Connection reset' errors on token refresh, which only fires once per token TTL (~1h), giving the pool plenty of time to go stale. Fixed by enabling evictExpiredConnections() and evictIdleConnections(30s). 2. RetryExecuter: DEADLINE_EXCEEDED was not retried — it fell through to the else branch and threw immediately. UNAUTHENTICATED and UNAVAILABLE were already retried; DEADLINE_EXCEEDED is equally transient and now gets one retry on the same pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedToo many files! This PR contains 288 files, which is 138 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (288)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Retrying DEADLINE_EXCEEDED is unsafe for non-idempotent write operations — if the call reached the server before the deadline fired, a retry duplicates the request. Reverting until server-side idempotency keys are in place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Two bugs identified from Rocketlane production error logs (8 errors over 3 days):
ScalekitAuthClient— HTTP connection pool reused stale TCP connections the server had already closed, causingConnection reseton token refresh. Token refresh only fires once per token TTL (~1h), giving the pool ample time to accumulate stale connections. Fixed by enablingevictExpiredConnections()andevictIdleConnections(30s)on the Apache HttpClient builder.RetryExecuter—DEADLINE_EXCEEDEDgRPC status was not retried; it fell through to theelsebranch and threw immediately.UNAUTHENTICATEDandUNAVAILABLEwere already retried on the same one-retry pattern.DEADLINE_EXCEEDEDis equally transient (channel reconnecting after network blip) and now gets the same treatment.Test plan
make test) against a live environment — 137 tests, 0 new failures introducedmainand this branch — not caused by these changes🤖 Generated with Claude Code