Skip to content

openai provider: pinned in_memory prompt cache rejects gpt-5.x models (require 24h tier) #279

Description

@PotatoParser

Summary

On the openai provider (api.openai.com), Gini hard-pins prompt_cache_retention: "in_memory" on every request. OpenAI's gpt-5.x models reportedly accept only the 24h extended prompt-cache tier and reject in_memory with an HTTP 400:

This model is compatible only with 24h extended prompt caching

So selecting a gpt-5.x model on the standard openai provider fails.

(Surfaced by an agent investigating the codex caching story; the api.openai.com gpt-5.x behavior is from that report and not yet independently reproduced here. The codex path is a separate backend — see below.)

Not affected

  • codex (chatgpt.com/backend-api/codex): the codex /responses builders deliberately omit prompt_cache_retention — that backend 400s on the parameter itself ({"detail":"Unsupported parameter: prompt_cache_retention"}, verified). So codex/gpt-5.5 is unaffected and works today.
  • gpt-4o / gpt-4.1 / o-series on the openai provider accept in_memory, so they're fine.

Root cause

  • The pin is hard-coded on the OpenAI-compatible builders in src/provider.ts (chat-completions + the openai /responses path).
  • It's deliberately listed in RESERVED_EXTRA_BODY_KEYS (src/provider.ts), so a user cannot override it via extraBody — the sanitizer strips any prompt_cache_retention they pass. That was intentional: keep the Zero-Data-Retention (ZDR) opt-out from being silently shadowed by user config. See ADR docs/adr/prompt-cache-in-memory-tier.md.

So fixing gpt-5.x on the openai path needs a code change, not a setting.

Why it's a tradeoff, not just a bug

in_memory is ZDR-eligible; 24h is not (the prompt is retained up to 24 hours). So gpt-5.x can't simply be flipped to 24h silently — that drops the privacy posture for those models.

Options

  1. Special-case gpt-5.x to 24h on the openai path — makes the models work, but silently loses ZDR for them (silent privacy regression).
  2. Make the retention tier configurable per provider (and drop it from the reserved-key blocklist on that path) so the user opts into 24h knowingly. Preferred — keeps ZDR a conscious choice.
  3. Leave it — gpt-5.x stays unusable on the openai provider; document the limitation.

Suggested next step

Option 2, behind its own ADR (the ZDR opt-out becomes user-selectable for the openai provider), gated on a cached_tokens measurement confirming 24h actually engages on gpt-5.x before relying on it.

Context

Surfaced while removing the cache warmer (#265). That PR is orthogonal — it keeps the in_memory pin unchanged — so this is a pre-existing limitation, filed separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:lowLow priority

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions