Summary
On the openai provider (api.openai.com), Gini hard-pins prompt_cache_retention: "in_memory" on every request. OpenAI's gpt-5.x models reportedly accept only the 24h extended prompt-cache tier and reject in_memory with an HTTP 400:
This model is compatible only with 24h extended prompt caching
So selecting a gpt-5.x model on the standard openai provider fails.
(Surfaced by an agent investigating the codex caching story; the api.openai.com gpt-5.x behavior is from that report and not yet independently reproduced here. The codex path is a separate backend — see below.)
Not affected
- codex (
chatgpt.com/backend-api/codex): the codex /responses builders deliberately omit prompt_cache_retention — that backend 400s on the parameter itself ({"detail":"Unsupported parameter: prompt_cache_retention"}, verified). So codex/gpt-5.5 is unaffected and works today.
- gpt-4o / gpt-4.1 / o-series on the
openai provider accept in_memory, so they're fine.
Root cause
- The pin is hard-coded on the OpenAI-compatible builders in
src/provider.ts (chat-completions + the openai /responses path).
- It's deliberately listed in
RESERVED_EXTRA_BODY_KEYS (src/provider.ts), so a user cannot override it via extraBody — the sanitizer strips any prompt_cache_retention they pass. That was intentional: keep the Zero-Data-Retention (ZDR) opt-out from being silently shadowed by user config. See ADR docs/adr/prompt-cache-in-memory-tier.md.
So fixing gpt-5.x on the openai path needs a code change, not a setting.
Why it's a tradeoff, not just a bug
in_memory is ZDR-eligible; 24h is not (the prompt is retained up to 24 hours). So gpt-5.x can't simply be flipped to 24h silently — that drops the privacy posture for those models.
Options
- Special-case gpt-5.x to
24h on the openai path — makes the models work, but silently loses ZDR for them (silent privacy regression).
- Make the retention tier configurable per provider (and drop it from the reserved-key blocklist on that path) so the user opts into
24h knowingly. Preferred — keeps ZDR a conscious choice.
- Leave it — gpt-5.x stays unusable on the
openai provider; document the limitation.
Suggested next step
Option 2, behind its own ADR (the ZDR opt-out becomes user-selectable for the openai provider), gated on a cached_tokens measurement confirming 24h actually engages on gpt-5.x before relying on it.
Context
Surfaced while removing the cache warmer (#265). That PR is orthogonal — it keeps the in_memory pin unchanged — so this is a pre-existing limitation, filed separately.
Summary
On the
openaiprovider (api.openai.com), Gini hard-pinsprompt_cache_retention: "in_memory"on every request. OpenAI's gpt-5.x models reportedly accept only the24hextended prompt-cache tier and rejectin_memorywith an HTTP 400:So selecting a gpt-5.x model on the standard
openaiprovider fails.(Surfaced by an agent investigating the codex caching story; the
api.openai.comgpt-5.x behavior is from that report and not yet independently reproduced here. The codex path is a separate backend — see below.)Not affected
chatgpt.com/backend-api/codex): the codex/responsesbuilders deliberately omitprompt_cache_retention— that backend 400s on the parameter itself ({"detail":"Unsupported parameter: prompt_cache_retention"}, verified). So codex/gpt-5.5 is unaffected and works today.openaiprovider acceptin_memory, so they're fine.Root cause
src/provider.ts(chat-completions + the openai/responsespath).RESERVED_EXTRA_BODY_KEYS(src/provider.ts), so a user cannot override it viaextraBody— the sanitizer strips anyprompt_cache_retentionthey pass. That was intentional: keep the Zero-Data-Retention (ZDR) opt-out from being silently shadowed by user config. See ADRdocs/adr/prompt-cache-in-memory-tier.md.So fixing gpt-5.x on the
openaipath needs a code change, not a setting.Why it's a tradeoff, not just a bug
in_memoryis ZDR-eligible;24his not (the prompt is retained up to 24 hours). So gpt-5.x can't simply be flipped to24hsilently — that drops the privacy posture for those models.Options
24hon theopenaipath — makes the models work, but silently loses ZDR for them (silent privacy regression).24hknowingly. Preferred — keeps ZDR a conscious choice.openaiprovider; document the limitation.Suggested next step
Option 2, behind its own ADR (the ZDR opt-out becomes user-selectable for the
openaiprovider), gated on acached_tokensmeasurement confirming24hactually engages on gpt-5.x before relying on it.Context
Surfaced while removing the cache warmer (#265). That PR is orthogonal — it keeps the
in_memorypin unchanged — so this is a pre-existing limitation, filed separately.