Skip to content

feat(openai): prompt-caching + reasoning continuity on the Claude Code→Responses path #393

Description

@Destynova2

Problem

On the Claude Code → grob → ChatGPT Codex (OAuth) path, grob translates Anthropic /v1/messages into an OpenAI Responses request, but omits two features that the real Codex CLI uses natively, observed by capturing both clients' raw requests:

Field Codex CLI → /v1/responses grob (Claude Code→Responses)
store false (stateless) false
prompt_cache_key <thread_id> absent
include ["reasoning.encrypted_content"] absent
reasoning {effort} {effort}

Result: Claude Code→gpt sessions lose prompt-caching (cost/latency on the growing prefix) and reasoning continuity across tool-use turns.

How Codex does it (ground-truth from openai/codex)

codex-rs/core/src/client.rs:

  • store: provider.is_azure_responses_endpoint()false on ChatGPT → stateless, which is why it needs encrypted reasoning.
  • include = ["reasoning.encrypted_content"] — only when model_info.supports_reasoning_summaries.
  • prompt_cache_key = self.state.thread_id (stable per conversation; overridable).

codex-rs/core/src/context_manager/history.rs:

  • Reasoning items are retained in history (is_api_message(Reasoning) => true), prioritised when encrypted_content: Some(_).
  • for_prompt() replays the full item list in recorded order as input each turn — reasoning items keep their position (right before the assistant turn / function_call they produced). The model decrypts and continues.

This is clean because Codex owns the conversation state in native Responses format. grob does not — Claude Code owns it in Anthropic format.

Proposed work

Phase 1 — prompt_cache_key (low risk, ship first)

Add prompt_cache_key: Option<String> to OpenAIResponsesRequest and set it from the per-session id grob already resolves for tool-spike keying (metadata.session_iduser_id). Stable per Claude Code session ⇒ better prefix-cache hits ⇒ cheaper input tokens + faster TTFT on agentic loops. No behavioural risk (optional field).

Phase 2 — reasoning.encrypted_content round-trip (higher effort, design-first)

Why it's hard: grob is a translator, not the conversation owner. Each turn Claude Code resends its Anthropic history, which does not carry gpt's reasoning items (they were translated to thinking deltas / dropped on the way back). So grob cannot simply replay them like Codex does — it must hold its own per-session reasoning state.

Design sketch:

  1. Add include: ["reasoning.encrypted_content"] (gated on reasoning-capable models, like Codex).
  2. Capture each response's reasoning item encrypted_content from the Codex SSE transform (currently mapped to thinking_delta / dropped).
  3. Keep a session-keyed reasoning store (à la DlpSessionManager), keyed by the resolved session id, with eviction.
  4. On each request, reconstruct input from Claude Code's messages and inject the stored reasoning items at the correct positions — each reasoning item immediately before the function_call/assistant turn it produced (correlate by the following tool_use call_id).

Risks / open questions: incorrect interleaving ⇒ backend 400 (reasoning item must precede …); statefulness + eviction in an otherwise per-request-stateless dispatch; and the marginal quality gain may be small since the visible conversation already carries most context. Recommend prototyping behind a codex config flag (e.g. reasoning_continuity = false by default) and measuring.

Scope

OpenAI Responses (Codex/OAuth) path only — no effect on Anthropic or other providers. Relates to the per-provider codex block (CodexOptions) added in #391.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions