fix: implement exponential backoff retry for transient network errors#400
Open
Bedirhan3428 wants to merge 1 commit into
Open
fix: implement exponential backoff retry for transient network errors#400Bedirhan3428 wants to merge 1 commit into
Bedirhan3428 wants to merge 1 commit into
Conversation
Welcome, @Bedirhan3428!Thanks for your first contribution! Before we proceed with the review, please sign the Fiduciary License Agreement: Once signed, this PR will be automatically updated. |
Thanks, @Bedirhan3428! 🎉Your CLA has been signed and is now on file. We'll proceed with the review shortly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR enhances the reliability of the
lago-python-clientnetwork layer by introducing transient network and server-side error resilience with an Exponential Backoff retry mechanism insideservices/request.py.Why is this necessary?
Currently, the
_create_retry_wrapperspecifically handlesHTTP 429(Rate Limit) errors. However, if a transient network anomaly occurs (such as a temporary socket dropping, DNS timeout, or edge proxies returning502 Bad Gateway,503 Service Unavailable, or504 Gateway Timeout), the SDK crashes instantly. For critical production telemetry and financial invoicing, these temporary drops should be safely handled with immediate, non-blocking exponential retries before giving up.What changes were made?
502,503, and504, raising a properly formedhttpx.HTTPStatusError.httpx.RequestErrorandhttpx.TimeoutExceptionalongside server failures.1s -> 2s -> 4s) usingtime.sleep().network_retry_attemptand the pre-existing SDKretry_attemptstate to maintain absolute tracking accuracy for subsequent observability steps (emit_rate_limit_info).HTTP 429workflow completely intact.Tested locally against the entire test suite, running flawlessly with 414 tests passing.