Skip to content

Commit ef3afb8

Browse files
[PMM-287] OpenAPI: Retries (#26)
Co-authored-by: Chai Landau <112015853+chailandau@users.noreply.github.com>
1 parent e9f879a commit ef3afb8

2 files changed

Lines changed: 235 additions & 38 deletions

File tree

openapi/responses/rate-limiting.mdx

Lines changed: 3 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,8 @@ Retry-After: 3600
103103
}
104104
```
105105

106+
*Learn more about retries [over here](/openapi/retries).*
107+
106108
## Rate limiting headers
107109

108110
Rate limiting headers are HTTP headers that provide information about the rate
@@ -153,43 +155,6 @@ Content-Type: application/json
153155
}
154156
```
155157

156-
## Examples or constants
157-
158-
When describing some of these headers it feels important to provide a bit of
159-
context about what the headers contain, either with an exact value, or if that's
160-
not possible then some examples of what values might appear.
161-
162-
Here is an example of how to do that with the `Retry-After` header, using the Header Object `example` field:
163-
164-
```yaml
165-
headers:
166-
Retry-After:
167-
description: The number of seconds to wait before making another request.
168-
schema:
169-
type: integer
170-
example: 3600
171-
```
172-
173-
Using an example here is important because the value of `Retry-After` can vary
174-
depending on the billing plan the user is on. For example, a free plan might
175-
have a `Retry-After` value of 60 seconds, while a paid plan might have a
176-
`Retry-After` value of 10 seconds.
177-
178-
If the value of `Retry-After` is always the same, then using `const` inside the
179-
`schema` object is a good way to indicate that.
180-
181-
```yaml
182-
headers:
183-
Retry-After:
184-
description: The number of seconds to wait before making another request.
185-
schema:
186-
type: integer
187-
const: 3600
188-
```
189-
190-
This indicates that the `Retry-After` value is always 3600 seconds, regardless of
191-
the billing plan the user is on.
192-
193158
## Rate limiting headers draft RFC
194159

195160
The `X-RateLimit-*` headers are not standard HTTP headers, and they are not
@@ -206,7 +171,7 @@ requests allowed.
206171
The following example shows a `RateLimit` header with a policy named "default",
207172
which has another 50 requests allowed in the next 30 seconds.
208173

209-
```
174+
```http
210175
RateLimit: "default";r=50;t=30
211176
```
212177

openapi/responses/retries.mdx

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
---
2+
title: Retries
3+
description: Communicate to API consumers when they should retry requests, and how
4+
long to wait before retrying. This can help reduce the number of requests being made to an API
5+
when its having a rough time, and can also help consumers avoid tripping over on rate limits.
6+
---
7+
8+
Retries are a common pattern in API design, especially when dealing with
9+
transient errors or rate limits, or connection issues that could have just been
10+
a bit getting flipped somewhere in the ether, but a second attempt would likely
11+
work.
12+
13+
API consumers may or may not automatically retry failed requests, and if they do, they
14+
may not know how long to wait before retrying. This is especially true for rate
15+
limits, where the API may be able to handle a burst of requests, but then
16+
throttle the consumer for a period of time to avoid overwhelming the system.
17+
18+
To avoid confusing consumers who just want to work with an API, it's a good idea
19+
to communicate to them how retries should be used in the API, with that
20+
information clearly explained in API documentation and automatically handled by
21+
SDKs.
22+
23+
# Retries in OpenAPI
24+
25+
Whilst OpenAPI does not having any dedicated functionality describing retries,
26+
it does not need to, as HTTP itself has retries covered.
27+
28+
The `Retry-After` header is defined in RFC 9110 as the standard way to
29+
communicate that a client or server failure has happened, but that a retry is
30+
possible after a certain amount of time.
31+
32+
```yaml
33+
responses:
34+
"429":
35+
description: Too Many Requests
36+
headers:
37+
Retry-After:
38+
description: The number of seconds to wait before retrying the request.
39+
schema:
40+
type: integer
41+
example: 120
42+
```
43+
44+
The most common usage of Retry-After is using the number of seconds to wait
45+
before trying again, but it can also be used to communicate a date and time when
46+
the request can be retried.
47+
48+
```yaml
49+
responses:
50+
"429":
51+
description: Too Many Requests
52+
headers:
53+
Retry-After:
54+
description: The number of seconds to wait before retrying the request.
55+
schema:
56+
type: string
57+
format: date-time
58+
example: 2025-07-01T12:00:00Z
59+
```
60+
61+
This header is used in a number of different scenarios, including:
62+
63+
- 404 Not Found - The server MAY send a Retry-After header field to indicate that the resource is temporarily unavailable, or could exist in the near future.
64+
- 408 Request Timeout - The server MAY send a Retry-After header field to indicate that it is temporary and after what time the client MAY try again.
65+
- 409 Conflict - The server MAY send a Retry-After header field to indicate that the conflict could be temporary and suggest a time where it may have been resolved.
66+
- 413 Content Too Large - If the condition is temporary, the server could send a Retry-After header field to indicate that it is temporary and the client could try again.
67+
- 429 Too Many Requests - The client has sent too many requests in a given amount of time, and the server is asking the client to wait before sending more requests.
68+
- 503 Service Unavailable - A service may be unavailable temporarily, so a Retry-After header field could suggest an appropriate amount of time for the client to wait before checking if the service is back.
69+
70+
Some other 5XX errors could well be retryable, such as 504 Gateway Timeout, and
71+
a 507 Insufficient Storage might resolve itself, but it's possible to go too
72+
far. For example, 501 Not Implemented could be seen as temporary because it
73+
could be implemented soon, but nobody wants API consumers retrying to API
74+
forever waiting for a feature to be deployed which is never going to be
75+
deployed.
76+
77+
Describing every single instance of anything that could ever happen in OpenAPI
78+
is not the goal of any API documentation because that is impossible. Focusing on
79+
key areas where retries are likely to be needed is a good balance, and from
80+
there, API consumers can use their own judgement on whether to retry or not.
81+
82+
## Examples or constants
83+
84+
When describing some of these headers it feels important to provide a bit of
85+
context about what the headers contain, either with an exact value, or if that's
86+
not possible then some examples of what values might appear.
87+
88+
Here is an example of how to do that with the `Retry-After` header, using the
89+
Header Object `example` field:
90+
91+
```yaml
92+
headers:
93+
Retry-After:
94+
description: The number of seconds to wait before making another request.
95+
schema:
96+
type: integer
97+
example: 3600
98+
```
99+
100+
Using an example here is important because the value of `Retry-After` can vary
101+
depending on the billing plan the user is on. For example, a free plan might
102+
have a `Retry-After` value of 60 seconds, while a paid plan might have a
103+
`Retry-After` value of 10 seconds.
104+
105+
If the value of `Retry-After` is always the same, then using `const` inside the
106+
`schema` object is a good way to indicate that (available from OpenAPI v3.1).
107+
108+
```yaml
109+
headers:
110+
Retry-After:
111+
description: The number of seconds to wait before making another request.
112+
schema:
113+
type: integer
114+
const: 3600
115+
```
116+
117+
This indicates that the `Retry-After` value is always 3600 seconds, regardless of
118+
the billing plan the user is on.
119+
120+
## x-speakeasy-retries
121+
122+
Another ways to communicate how retries should work, is using the Speakeasy
123+
vendor extension `x-speakeasy-retries`. Using this extension, OpenAPI can
124+
communicate retry for a particular operation, or the API as a whole, to API
125+
consumers.
126+
127+
This extension can be used to specify the number of retries, the delay between
128+
retries, and the maximum delay.
129+
130+
```yaml
131+
/webhooks/subscribe:
132+
post:
133+
operationId: subscribeToWebhooks
134+
servers:
135+
- url: https://speakeasy.bar
136+
x-speakeasy-usage-example:
137+
tags:
138+
- server
139+
- retries
140+
x-speakeasy-retries:
141+
strategy: backoff
142+
backoff:
143+
initialInterval: 10
144+
maxInterval: 200
145+
maxElapsedTime: 1000
146+
exponent: 1.15
147+
```
148+
149+
This example shows how to use the `x-speakeasy-retries` extension to specify a
150+
"backoff" strategy for retries, which is a common pattern for retrying failed
151+
requests with longer waits between retries. The `initialInterval` is the time to
152+
wait before the first retry, and the `maxInterval` is the maximum time to wait
153+
between retries. The `maxElapsedTime` is the maximum time to wait before giving
154+
up on retries, and the `exponent` is the exponent to use for the backoff
155+
calculation.
156+
157+
This approach could be used as well as, or instead of, the `Retry-After` header.
158+
It would be a good way to add automatic retries to SDKs which means the client
159+
will be automatically retrying things without needing to learn everything about
160+
which status codes are retryable, and how long to wait before retrying.
161+
162+
## Rate Limits
163+
164+
Rate limits are a common use case for retries, and the `Retry-After` header is a
165+
great way to communicate to API consumers when they should retry requests, and
166+
how long to wait before retrying.
167+
168+
Learn more about [rate limiting with OpenAPI](/openapi/rate-limiting).
169+
170+
## Best practices
171+
172+
### Examples over constants
173+
174+
When describing the `Retry-After` header, it's a good idea to use examples instead of
175+
constants to allow for flexibility in the API. Some APIs will push up the
176+
`Retry-After` value during times of high load, and some APIs will have different
177+
`Retry-After` values depending on the billing plan the user is on. Using
178+
examples allows for this flexibility, whilst still letting API consumers know
179+
what sort of values they can expect (e.g: integer vs date-time).
180+
181+
### Expand Retry-After in periods of high load
182+
183+
When the API is under heavy load, it's a good idea to expand the `Retry-After`
184+
header to allow for longer wait times between retries. This can help reduce the
185+
number of requests being made to the API, and can also help consumers avoid
186+
tripping over on rate limits.
187+
188+
### Use exponential backoff
189+
190+
When retrying requests, it's a good idea to suggest an exponential backoff
191+
approach, as a server which is struggling to keep up with requests does not need
192+
to be hammered with even more requests. Sometimes it needs a moment to recover,
193+
and clients giving larger gaps between retries can help with that.
194+
195+
### Document alternative solutions
196+
197+
Once the maximum number of retries has been reached, it's is likely to be an
198+
actual outage instead of simply a blip in availability. The API client and their
199+
end-users should be informed of this, and given alternative solutions to the
200+
problem. This could be a link to a status page, or a support email address.
201+
202+
For example, at a coworking space, when the conference room booking system was
203+
down, the client would replace the "Submit Booking" button with a message saying
204+
"The booking system is down, please email somebody@example.org". OpenAPI is a
205+
great way to communicate this sort of information to API clients so they can use
206+
it to inform their users.
207+
208+
```yaml
209+
responses:
210+
"503":
211+
description: Service Unavailable
212+
headers:
213+
Retry-After:
214+
description: The number of seconds to wait before making another request.
215+
schema:
216+
type: integer
217+
example: 3600
218+
content:
219+
application/problem+json:
220+
schema:
221+
type: object
222+
properties:
223+
type:
224+
type: string
225+
const: "https://example.com/probs/service-unavailable"
226+
title:
227+
type: string
228+
const: Service Unavailable
229+
detail:
230+
type: string
231+
const: The service is currently unavailable, please try again later, or contact support@example.org to report the issue.
232+
```

0 commit comments

Comments
 (0)