diff --git a/source/client-backpressure/client-backpressure.md b/source/client-backpressure/client-backpressure.md index 6aea3d2aaa..b8c5b02f8d 100644 --- a/source/client-backpressure/client-backpressure.md +++ b/source/client-backpressure/client-backpressure.md @@ -129,12 +129,18 @@ rules: - To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and [retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See [Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand) -3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply exponential backoff - according to the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` - - `jitter` is a random jitter value between 0 and 1. - - `BASE_BACKOFF` is constant 100ms. - - `MAX_BACKOFF` is 10000ms. - - This results in delays of 100ms and 200ms before accounting for jitter. +3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply backoff according to the + following rules: + 1. If `retryAfterMS` is present on the error and has a positive value, apply backoff according to the following + formula: `backoff = retryAfterMS.value + (jitter * retryAfterMS)` + - `jitter` is a random jitter value between -0.5 and 0.5. + - `retryAfterMS.value` is the value of the error's `retryAfterMS` field. + 2. Otherwise, apply exponential backoff according to the following formula: + `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` + - `jitter` is a random jitter value between 0 and 1. + - `BASE_BACKOFF` is constant 100ms. + - `MAX_BACKOFF` is 10000ms. + - This results in delays of 100ms and 200ms before accounting for jitter. 4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the client MUST add the previously used server's address to the list of deprioritized server addresses for [server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a @@ -213,9 +219,16 @@ def execute_command_retryable(command, ...): deprioritized_servers.append(server.address) if is_overload: - jitter = random.random() # Random float between [0.0, 1.0). - backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2 ** (attempt - 1)) - + # If present on the error, retryAfterMS sets the backoff + retry_after_ms = exc.retry_after_ms + if retry_after_ms: + retry_after /= 1000 # Convert from milliseconds to seconds + jitter = random.uniform(-0.5, 0.5) # Random float between [-0.5, 0.5]. + backoff = (jitter * retry_after) + retry_after + # Otherwise fall back to exponential + else: + jitter = random.random() # Random float between [0.0, 1.0). + backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2 ** (attempt - 1)) # If the delay exceeds the deadline, bail early. if _csot.get_timeout(): if time.monotonic() + backoff > _csot.get_deadline(): @@ -433,6 +446,8 @@ to understand and configure. ## Changelog +- 2026-06-16: Add support for retryAfterMS backoff calculation. + - 2026-04-14: Clarify correct retry behavior when a mix of overload and non-overload errors are encountered. - 2026-03-30: Introduce phase 1 support without token buckets. diff --git a/source/client-backpressure/tests/README.md b/source/client-backpressure/tests/README.md index 17becefd0a..8f25f21f0f 100644 --- a/source/client-backpressure/tests/README.md +++ b/source/client-backpressure/tests/README.md @@ -119,3 +119,65 @@ option. 5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels. 6. Assert that the total number of started commands is `maxAdaptiveRetries` + 1 (2). + +#### Test 5: Overload Errors with retryAfterMS override exponential backoff + +Drivers should test that overload errors with `retryAfterMS` override the default exponential backoff policy. This test +MUST be executed against a MongoDB 9.0+ server that has enabled the `configureFailPoint` command with the `errorLabels` +option. + +1. Let `client` be a `MongoClient`. + +2. Let `coll` be a collection. + +3. Configure the random number generator used for exponential backoff jitter to always return a number as close as + possible to `1`. + +4. Configure the following failPoint: + + ```javascript + { + configureFailPoint: 'failCommand', + mode: 'alwaysOn', + data: { + failCommands: ['insert'], + errorCode: 462, + errorLabels: ['SystemOverloadedError', 'RetryableError'] + } + } + ``` + +5. Insert the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution. + + ```javascript + const start = performance.now(); + expect( + await coll.insertOne({ a: 1 }).catch(e => e) + ).to.be.an.instanceof(MongoServerError); + const end = performance.now(); + ``` + +6. Configure the random number generator used for `retryAfterMS` jitter to always return `0`. + +7. Run the following command to set up `retryAfterMS` on overload errors. + + ```python + client.admin.command("setParameter", 1, overloadRetryAfterMS=50) + ``` + +8. Execute step 5 again. + +9. Run the following command to disable `retryAfterMS` on overload errors. + + ```python + client.admin.command("setParameter", 1, overloadRetryAfterMS=0) + ``` + +10. Compare the time between the two runs. + + ```python + assertTrue(absolute_value(exponential_backoff_time - (with_retry_after_ms_time + 0.2 seconds)) < 0.2 seconds) + ``` + + The difference in the backoffs is 0.2 seconds. There is a 0.2-second window to account for potential variance + between the two runs.