feat(openai): prompt-caching + reasoning continuity on the Claude Code→Responses path

## Problem

On the **Claude Code → grob → ChatGPT Codex (OAuth)** path, grob translates Anthropic `/v1/messages` into an OpenAI Responses request, but omits two features that the **real Codex CLI uses natively**, observed by capturing both clients' raw requests:

| Field | Codex CLI → `/v1/responses` | grob (Claude Code→Responses) |
|---|---|---|
| `store` | `false` (stateless) | `false` ✅ |
| `prompt_cache_key` | `<thread_id>` | **absent** ❌ |
| `include` | `["reasoning.encrypted_content"]` | **absent** ❌ |
| `reasoning` | `{effort}` | `{effort}` ✅ |

Result: Claude Code→gpt sessions lose **prompt-caching** (cost/latency on the growing prefix) and **reasoning continuity** across tool-use turns.

## How Codex does it (ground-truth from `openai/codex`)

`codex-rs/core/src/client.rs`:
- `store: provider.is_azure_responses_endpoint()` → **`false`** on ChatGPT → stateless, which is *why* it needs encrypted reasoning.
- `include = ["reasoning.encrypted_content"]` — only when `model_info.supports_reasoning_summaries`.
- `prompt_cache_key = self.state.thread_id` (stable per conversation; overridable).

`codex-rs/core/src/context_manager/history.rs`:
- Reasoning items are **retained** in history (`is_api_message(Reasoning) => true`), prioritised when `encrypted_content: Some(_)`.
- `for_prompt()` replays the **full item list in recorded order** as `input` each turn — reasoning items keep their position (right before the assistant turn / `function_call` they produced). The model decrypts and continues.

This is clean because **Codex owns the conversation state in native Responses format**. grob does not — Claude Code owns it in Anthropic format.

## Proposed work

### Phase 1 — `prompt_cache_key` (low risk, ship first)
Add `prompt_cache_key: Option<String>` to `OpenAIResponsesRequest` and set it from the per-session id grob already resolves for tool-spike keying (`metadata.session_id` → `user_id`). Stable per Claude Code session ⇒ better prefix-cache hits ⇒ cheaper input tokens + faster TTFT on agentic loops. No behavioural risk (optional field).

### Phase 2 — `reasoning.encrypted_content` round-trip (higher effort, design-first)
**Why it's hard:** grob is a translator, not the conversation owner. Each turn Claude Code resends its Anthropic history, which does **not** carry gpt's reasoning items (they were translated to thinking deltas / dropped on the way back). So grob cannot simply replay them like Codex does — it must hold its own per-session reasoning state.

**Design sketch:**
1. Add `include: ["reasoning.encrypted_content"]` (gated on reasoning-capable models, like Codex).
2. Capture each response's reasoning item `encrypted_content` from the Codex SSE transform (currently mapped to `thinking_delta` / dropped).
3. Keep a **session-keyed reasoning store** (à la `DlpSessionManager`), keyed by the resolved session id, with eviction.
4. On each request, reconstruct `input` from Claude Code's messages **and inject the stored reasoning items at the correct positions** — each reasoning item immediately before the `function_call`/assistant turn it produced (correlate by the following `tool_use` `call_id`).

**Risks / open questions:** incorrect interleaving ⇒ backend 400 (`reasoning item must precede …`); statefulness + eviction in an otherwise per-request-stateless dispatch; and the marginal quality gain may be small since the *visible* conversation already carries most context. Recommend prototyping behind a `codex` config flag (e.g. `reasoning_continuity = false` by default) and measuring.

## Scope
OpenAI Responses (Codex/OAuth) path only — no effect on Anthropic or other providers. Relates to the per-provider `codex` block (`CodexOptions`) added in #391.

Field	Codex CLI → `/v1/responses`	grob (Claude Code→Responses)
`store`	`false` (stateless)	`false` ✅
`prompt_cache_key`	`<thread_id>`	absent ❌
`include`	`["reasoning.encrypted_content"]`	absent ❌
`reasoning`	`{effort}`	`{effort}` ✅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(openai): prompt-caching + reasoning continuity on the Claude Code→Responses path #393

Problem

How Codex does it (ground-truth from `openai/codex`)

Proposed work

Phase 1 — `prompt_cache_key` (low risk, ship first)

Phase 2 — `reasoning.encrypted_content` round-trip (higher effort, design-first)

Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(openai): prompt-caching + reasoning continuity on the Claude Code→Responses path #393

Description

Problem

How Codex does it (ground-truth from openai/codex)

Proposed work

Phase 1 — prompt_cache_key (low risk, ship first)

Phase 2 — reasoning.encrypted_content round-trip (higher effort, design-first)

Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

How Codex does it (ground-truth from `openai/codex`)

Phase 1 — `prompt_cache_key` (low risk, ship first)

Phase 2 — `reasoning.encrypted_content` round-trip (higher effort, design-first)