Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 24 additions & 9 deletions source/client-backpressure/client-backpressure.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,18 @@ rules:
- To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and
[retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See
[Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand)
3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply exponential backoff
according to the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
- `jitter` is a random jitter value between 0 and 1.
- `BASE_BACKOFF` is constant 100ms.
- `MAX_BACKOFF` is 10000ms.
- This results in delays of 100ms and 200ms before accounting for jitter.
3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply backoff according to the
following rules:
1. If `retryAfterMS` is present on the error and has a positive value, apply backoff according to the following
formula: `backoff = retryAfterMS.value + (jitter * retryAfterMS)`
- `jitter` is a random jitter value between -0.5 and 0.5.
- `retryAfterMS.value` is the value of the error's `retryAfterMS` field.
2. Otherwise, apply exponential backoff according to the following formula:
`backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
- `jitter` is a random jitter value between 0 and 1.
- `BASE_BACKOFF` is constant 100ms.
- `MAX_BACKOFF` is 10000ms.
- This results in delays of 100ms and 200ms before accounting for jitter.
4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the
client MUST add the previously used server's address to the list of deprioritized server addresses for
[server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a
Expand Down Expand Up @@ -213,9 +219,16 @@ def execute_command_retryable(command, ...):
deprioritized_servers.append(server.address)

if is_overload:
jitter = random.random() # Random float between [0.0, 1.0).
backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2 ** (attempt - 1))

# If present on the error, retryAfterMS sets the backoff
retry_after_ms = exc.retry_after_ms
if retry_after_ms:
retry_after /= 1000 # Convert from milliseconds to seconds
jitter = random.uniform(-0.5, 0.5) # Random float between [-0.5, 0.5].
backoff = (jitter * retry_after) + retry_after
# Otherwise fall back to exponential
else:
jitter = random.random() # Random float between [0.0, 1.0).
backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2 ** (attempt - 1))
# If the delay exceeds the deadline, bail early.
if _csot.get_timeout():
if time.monotonic() + backoff > _csot.get_deadline():
Expand Down Expand Up @@ -433,6 +446,8 @@ to understand and configure.

## Changelog

- 2026-06-16: Add support for retryAfterMS backoff calculation.

- 2026-04-14: Clarify correct retry behavior when a mix of overload and non-overload errors are encountered.

- 2026-03-30: Introduce phase 1 support without token buckets.
Expand Down
62 changes: 62 additions & 0 deletions source/client-backpressure/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,65 @@ option.
5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.

6. Assert that the total number of started commands is `maxAdaptiveRetries` + 1 (2).

#### Test 5: Overload Errors with retryAfterMS override exponential backoff

Drivers should test that overload errors with `retryAfterMS` override the default exponential backoff policy. This test
MUST be executed against a MongoDB 9.0+ server that has enabled the `configureFailPoint` command with the `errorLabels`
option.

1. Let `client` be a `MongoClient`.

2. Let `coll` be a collection.

3. Configure the random number generator used for exponential backoff jitter to always return a number as close as
possible to `1`.

4. Configure the following failPoint:

```javascript
{
configureFailPoint: 'failCommand',
mode: 'alwaysOn',
data: {
failCommands: ['insert'],
errorCode: 462,
errorLabels: ['SystemOverloadedError', 'RetryableError']
}
}
```

5. Insert the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution.

```javascript
const start = performance.now();
expect(
await coll.insertOne({ a: 1 }).catch(e => e)
).to.be.an.instanceof(MongoServerError);
const end = performance.now();
```

6. Configure the random number generator used for `retryAfterMS` jitter to always return `0`.

7. Run the following command to set up `retryAfterMS` on overload errors.

```python
client.admin.command("setParameter", 1, overloadRetryAfterMS=50)
```

8. Execute step 5 again.

9. Run the following command to disable `retryAfterMS` on overload errors.

```python
client.admin.command("setParameter", 1, overloadRetryAfterMS=0)
```

10. Compare the time between the two runs.

```python
assertTrue(absolute_value(exponential_backoff_time - (with_retry_after_ms_time + 0.2 seconds)) < 0.2 seconds)
```

The difference in the backoffs is 0.2 seconds. There is a 0.2-second window to account for potential variance
between the two runs.
Loading