Problem
On the Claude Code → grob → ChatGPT Codex (OAuth) path, grob translates Anthropic /v1/messages into an OpenAI Responses request, but omits two features that the real Codex CLI uses natively, observed by capturing both clients' raw requests:
| Field |
Codex CLI → /v1/responses |
grob (Claude Code→Responses) |
store |
false (stateless) |
false ✅ |
prompt_cache_key |
<thread_id> |
absent ❌ |
include |
["reasoning.encrypted_content"] |
absent ❌ |
reasoning |
{effort} |
{effort} ✅ |
Result: Claude Code→gpt sessions lose prompt-caching (cost/latency on the growing prefix) and reasoning continuity across tool-use turns.
How Codex does it (ground-truth from openai/codex)
codex-rs/core/src/client.rs:
store: provider.is_azure_responses_endpoint() → false on ChatGPT → stateless, which is why it needs encrypted reasoning.
include = ["reasoning.encrypted_content"] — only when model_info.supports_reasoning_summaries.
prompt_cache_key = self.state.thread_id (stable per conversation; overridable).
codex-rs/core/src/context_manager/history.rs:
- Reasoning items are retained in history (
is_api_message(Reasoning) => true), prioritised when encrypted_content: Some(_).
for_prompt() replays the full item list in recorded order as input each turn — reasoning items keep their position (right before the assistant turn / function_call they produced). The model decrypts and continues.
This is clean because Codex owns the conversation state in native Responses format. grob does not — Claude Code owns it in Anthropic format.
Proposed work
Phase 1 — prompt_cache_key (low risk, ship first)
Add prompt_cache_key: Option<String> to OpenAIResponsesRequest and set it from the per-session id grob already resolves for tool-spike keying (metadata.session_id → user_id). Stable per Claude Code session ⇒ better prefix-cache hits ⇒ cheaper input tokens + faster TTFT on agentic loops. No behavioural risk (optional field).
Phase 2 — reasoning.encrypted_content round-trip (higher effort, design-first)
Why it's hard: grob is a translator, not the conversation owner. Each turn Claude Code resends its Anthropic history, which does not carry gpt's reasoning items (they were translated to thinking deltas / dropped on the way back). So grob cannot simply replay them like Codex does — it must hold its own per-session reasoning state.
Design sketch:
- Add
include: ["reasoning.encrypted_content"] (gated on reasoning-capable models, like Codex).
- Capture each response's reasoning item
encrypted_content from the Codex SSE transform (currently mapped to thinking_delta / dropped).
- Keep a session-keyed reasoning store (à la
DlpSessionManager), keyed by the resolved session id, with eviction.
- On each request, reconstruct
input from Claude Code's messages and inject the stored reasoning items at the correct positions — each reasoning item immediately before the function_call/assistant turn it produced (correlate by the following tool_use call_id).
Risks / open questions: incorrect interleaving ⇒ backend 400 (reasoning item must precede …); statefulness + eviction in an otherwise per-request-stateless dispatch; and the marginal quality gain may be small since the visible conversation already carries most context. Recommend prototyping behind a codex config flag (e.g. reasoning_continuity = false by default) and measuring.
Scope
OpenAI Responses (Codex/OAuth) path only — no effect on Anthropic or other providers. Relates to the per-provider codex block (CodexOptions) added in #391.
Problem
On the Claude Code → grob → ChatGPT Codex (OAuth) path, grob translates Anthropic
/v1/messagesinto an OpenAI Responses request, but omits two features that the real Codex CLI uses natively, observed by capturing both clients' raw requests:/v1/responsesstorefalse(stateless)false✅prompt_cache_key<thread_id>include["reasoning.encrypted_content"]reasoning{effort}{effort}✅Result: Claude Code→gpt sessions lose prompt-caching (cost/latency on the growing prefix) and reasoning continuity across tool-use turns.
How Codex does it (ground-truth from
openai/codex)codex-rs/core/src/client.rs:store: provider.is_azure_responses_endpoint()→falseon ChatGPT → stateless, which is why it needs encrypted reasoning.include = ["reasoning.encrypted_content"]— only whenmodel_info.supports_reasoning_summaries.prompt_cache_key = self.state.thread_id(stable per conversation; overridable).codex-rs/core/src/context_manager/history.rs:is_api_message(Reasoning) => true), prioritised whenencrypted_content: Some(_).for_prompt()replays the full item list in recorded order asinputeach turn — reasoning items keep their position (right before the assistant turn /function_callthey produced). The model decrypts and continues.This is clean because Codex owns the conversation state in native Responses format. grob does not — Claude Code owns it in Anthropic format.
Proposed work
Phase 1 —
prompt_cache_key(low risk, ship first)Add
prompt_cache_key: Option<String>toOpenAIResponsesRequestand set it from the per-session id grob already resolves for tool-spike keying (metadata.session_id→user_id). Stable per Claude Code session ⇒ better prefix-cache hits ⇒ cheaper input tokens + faster TTFT on agentic loops. No behavioural risk (optional field).Phase 2 —
reasoning.encrypted_contentround-trip (higher effort, design-first)Why it's hard: grob is a translator, not the conversation owner. Each turn Claude Code resends its Anthropic history, which does not carry gpt's reasoning items (they were translated to thinking deltas / dropped on the way back). So grob cannot simply replay them like Codex does — it must hold its own per-session reasoning state.
Design sketch:
include: ["reasoning.encrypted_content"](gated on reasoning-capable models, like Codex).encrypted_contentfrom the Codex SSE transform (currently mapped tothinking_delta/ dropped).DlpSessionManager), keyed by the resolved session id, with eviction.inputfrom Claude Code's messages and inject the stored reasoning items at the correct positions — each reasoning item immediately before thefunction_call/assistant turn it produced (correlate by the followingtool_usecall_id).Risks / open questions: incorrect interleaving ⇒ backend 400 (
reasoning item must precede …); statefulness + eviction in an otherwise per-request-stateless dispatch; and the marginal quality gain may be small since the visible conversation already carries most context. Recommend prototyping behind acodexconfig flag (e.g.reasoning_continuity = falseby default) and measuring.Scope
OpenAI Responses (Codex/OAuth) path only — no effect on Anthropic or other providers. Relates to the per-provider
codexblock (CodexOptions) added in #391.