feat(sdk): upgrade Claude/Codex SDKs + 5 cross-provider capabilities#17
Conversation
…ities Claude Agent SDK 0.2.85 → 0.2.112, Codex SDK 0.117.0 → 0.121.0. Both providers now expose the same feature surface in Discord. Features (両端対等): 1. TerminalReason — unified enum across providers, surfaced in status line - Claude: map SDK terminal_reason - Codex: infer from turn.failed / abort 2. TodoList rendering — deep XState integration - New SessionContextFields (todoList, todoUpdatedAt, etc.) - 3 new XState events + subscribe-mirrored snapshot - Status card renders checkbox list for both providers 3. permission_denied — unified event + 5-item ring buffer - New PlatformEvent routed through XState - Recorded for both canUseTool and future SDK-hook denials 4. Batch approval / defer queue — /agent batch subcommand - batch-approval-store with per-session Promise resolvers - pendingApprovals capped at 100 to prevent unbounded growth - Status card shows queue + denial history 5. Monitor reasoning effort — MONITOR_REASONING_EFFORT config - Codex: modelReasoningEffort on monitor thread - Claude: upgrade to MONITOR_CLAUDE_MODEL when effort ≥ high Also: consolidate duplicated type declarations via SessionContextFields and StatusCardViewData; extract recordPermissionDenial helper to dedupe the two denial paths. 934 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b75229afba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const batchAction = await new Promise<'approve' | 'reject'>((resolve) => { | ||
| enqueueBatchApproval(liveSession.id, { | ||
| gateId, | ||
| toolUseID: context.toolUseID, | ||
| toolName: displayName, |
There was a problem hiding this comment.
Cancel pending batch approvals on abort/end
This await new Promise(...) has no timeout or abort path, so a deferred tool check can remain unresolved forever if the session is stopped/ended before /agent batch approve-all|reject-all is run. Unlike the normal gate flow (which is settled by gateCoordinator), this branch never self-settles, so batch requests can leak and leave the run stuck waiting on canUseTool. Wire this promise to cancellation (e.g., context.signal) and/or ensure session cleanup drains batch-approval-store.
Useful? React with 👍 / 👎.
| await updateSessionState(ctx.sessionId, { | ||
| type: 'todo_updated', | ||
| sessionId: ctx.sessionId, | ||
| source: providerSource(ctx.sessionId), | ||
| confidence: 'high', |
There was a problem hiding this comment.
Force rerender for context-only state updates
These new todo_updated updates go through updateSessionState, but panel rendering currently only triggers when state/turn/phase changes (see panel-adapter.ts stateChanged check). todo_updated and permission_denied only mutate context fields, so the status card often stays stale until an unrelated lifecycle transition happens. This breaks the new “实时 Todo/拒绝记录” behavior; context changes from these events need to be treated as render-worthy.
Useful? React with 👍 / 👎.
… test
Discord 官方没有测试套件(已查证 docs.discord.com、discord-api-docs#3355、
discord.js#6179、npm discord.js.test),社区也没有活跃的成熟方案。业内公认
做法是自建测试 guild + 独立 bot token + 隔离数据层。本提交落地该路径并
再进一步:事件层注入触发 + 真实 Discord fetch 断言,兼顾速度与保真度。
架构:
- scripts/e2e/harness/ 共享基础设施
- harness.ts login + scratch category/channel + createSession +
tag-based cleanup sweeper + onFail snapshot
- assertions.ts waitForEmbedField / waitForMessageMatching 带轮询
- fake-interaction.ts 最小 ChatInputCommandInteraction / ButtonInteraction
shape,channel 字段注入真实 TextChannel → handler
内部 reply/send 的副作用落到真实 Discord
- cleanup-sweeper.ts 按 channel topic `wsc-e2e:<runId>:<ttl>` 回收僵尸
- artifacts.ts 失败时 dump embeds.json + state-machine.json
- scripts/e2e/features/ 4 个 *.e2e.ts,每个约 100-150 行
- terminal-reason Claude+Codex × 4 reason (max_turns / rate_limited
/ context_too_long / aborted)
- todo-list Claude+Codex,待办 embed field + 清空收回
- permission-denied Claude+Codex,digest ⛔ 消息 + 状态卡 field +
ring buffer cap 5
- batch-approval /agent batch on/approve-all/reject-all/off 4 分支
- scripts/e2e/index.ts 串行 runner,--only=<name> 单跑支持
- packages/engine/__tests__/provider-runtime-monitor.test.ts
Monitor effort 9 个单元测试(Discord 侧不可见,用 unit 覆盖 high/xhigh/
medium/empty × Claude/Codex 分支)
E2E 过程中真的抓到并修复 3 个产品 bug:
1. TerminalReason 在失败路径被 sendTurnFailure 吞掉,embed 从不展示结束
原因。修:panel-adapter 在 failure path 前置 `(原因)\n` 前缀。
2. panel-adapter 的 stateChanged 检查只看 state/turn/phase,遗漏新的
context 字段(todoList / denials / batchApprovalMode / pendingApprovals),
导致 XState context 变更不触发状态卡重绘。修:加 contextChanged 检查。
3. /agent batch 直接改 stateMachine 不通知 panel 重绘;XState
applyBatchApprovalSet 复用 event.pendingApprovals 的 ref 导致
contextChanged 引用比较失效。修:handleAgentBatch 改走
updateSessionState 统一路径;applyBatchApprovalSet 强制 copy array。
package.json 新增 test:discord:e2e;local-acceptance-suite 追加
discord-features-e2e 入口。
验证:typecheck + lint + 943 unit tests 全绿;e2e 两次连跑全绿,总时长
约 36s(boot 10s + 4 features 并发执行真实 Discord fetch 断言)。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2E 测试 + 产品修复(commit dea9498)调研结论Discord 官方没有测试套件(核对:docs.discord.com/developers、discord-api-docs#3355、discord.js#6179、npm discord.js.test 5 年前废弃)。社区工具 Corde 维护平平,无 mock-gateway 方案。业内公认做法:自建测试 guild + 独立 bot token + 隔离数据层。本提交落地该路径 + 事件层注入触发 + 真实 Discord fetch 断言。 新增
E2E 抓到并修的 3 个产品 bug
验证
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dea9498a02
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| export function enqueueBatchApproval(sessionId: string, entry: BatchApprovalEntry): void { | ||
| if (!store.has(sessionId)) store.set(sessionId, []); | ||
| store.get(sessionId)!.push(entry); |
There was a problem hiding this comment.
Cap deferred approval store to visible queue size
pendingApprovals is capped at 100 in state (slice(-MAX_PENDING_APPROVALS)), but this backing store is unbounded, so once more than 100 tool requests are deferred, older entries disappear from the status card while still remaining actionable in approve-all/reject-all. In that scenario, users can approve unseen tool calls, which is a safety regression for batch approval mode.
Useful? React with 👍 / 👎.
| const MAX_LINES = 8; | ||
| const MAX_CHAR = 180; | ||
| const lines: string[] = []; | ||
| for (const item of items.slice(0, MAX_LINES)) { | ||
| const mark = item.completed ? '☑' : '☐'; |
There was a problem hiding this comment.
Keep todo field under Discord embed value limit
This renderer can generate up to ~8×180 characters (plus markers/newlines), which exceeds Discord’s 1024-character embed field value limit. When long todo items are returned, statusCard.update will fail on message edit/send and the status card stops reflecting new state. The todo rendering needs a total-length budget, not only per-line truncation.
Useful? React with 👍 / 👎.
| if (action === 'on') { | ||
| await updateSessionState(session.id, { | ||
| type: 'batch_approval_changed', | ||
| sessionId: session.id, | ||
| source, |
There was a problem hiding this comment.
Reject
/agent batch for Codex sessions
This command enables batch mode for any provider, but deferred approvals are only enqueued in the Claude permission handler path. Because Codex sessions do not use that path, action:on reports success yet no Codex approvals are actually deferred, leaving users in a misleading “enabled” state with no effect. Add a provider guard here (or wire an actual Codex implementation) so behavior matches the command response.
Useful? React with 👍 / 👎.
Addresses 4 actionable inline comments from chatgpt-codex-connector bot on PR #17 (1 comment on stateChanged check was already fixed in prior commit). P1 — Cap batch-approval-store at MAX_BATCH_APPROVAL_STORE_SIZE (100): state.pendingApprovals already caps at 100, but the backing store was unbounded. Over 100 deferred requests would disappear from the status card while remaining actionable via approve-all — silently approving tool calls the user never saw. enqueueBatchApproval now returns 'enqueued' | 'overflow'; permission-gate auto-rejects on overflow so the SDK turn unblocks instead of queuing invisibly. P1 — Cancel pending batch approvals on session abort/end: The `await new Promise(...)` in permission-gate's batch branch had no cancellation path. Session stop/shutdown would leak the promise and hang the SDK turn forever. Now: - `context.signal.addEventListener('abort', ...)` resolves 'reject' when the provider's AbortController fires - endSession() calls clearBatchApprovalStore() so any still-pending entries drain to reject during cleanup P2 — Keep todo field under Discord's 1024-char embed value limit: 8 lines × 180 chars + markers could exceed 1024 → PATCH message would 400. Added a 950-char TOTAL_BUDGET that stops adding lines (and the "… 还有 N 项" suffix) once the budget is exhausted. P2 — Reject /agent batch for Codex sessions: Batch mode only works in the Claude canUseTool path. Codex uses approvalPolicy on its own thread and never touches the batch store, so `action:on` previously reported success but deferred nothing. Now handleAgentBatch returns early with a clear message pointing users to `/agent permissions codex-approval` for Codex sessions. Also raised e2e assertion timeout default from 10s → 15s to absorb slow-network Discord REST latency (first-run boot hit 38s on weak link). Verification: - typecheck + lint + 944 unit tests green (added overflow test case) - pnpm test:discord:e2e — 4 features pass, 26s total Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Codex review 4 条反馈已修 — commit 2cf5f38。944 unit + 4 e2e 全绿。 |
|
@codex code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2cf5f3897e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (context.signal.aborted) settle('reject'); | ||
| else context.signal.addEventListener('abort', () => settle('reject'), { once: true }); |
There was a problem hiding this comment.
Remove aborted entries from batch approval queues
When context.signal aborts (e.g. /agent stop), this path only calls settle('reject') for the waiting promise, but it never removes that request from batch-approval-store or pendingApprovals state. Because /agent stop aborts without calling endSession, these stale entries can accumulate across runs, inflate approve-all/reject-all counts, and eventually hit MAX_BATCH_APPROVAL_STORE_SIZE, causing new approvals to be auto-rejected as overflow even though no real pending approvals remain.
Useful? React with 👍 / 👎.
Addresses Codex review P1 on commit 2cf5f38 (discussion_r3099408511): Previously the batch-approval abort path only called settle('reject') for the waiting canUseTool promise, but left the entry in `batch-approval-store` and in XState's `pendingApprovals`. Since `/agent stop` calls `abortSessionWithReason` (not `endSession`), these zombie entries persist across stop/restart cycles — inflating approve-all/reject-all counts and eventually hitting the 100-item overflow cap, at which point legitimate new approvals get auto-rejected because the queue is full of dead entries. Fix: both `abortSessionWithReason` and `endSession` now call `clearBatchApprovalStore(id)` and `stateMachine.clearPendingApprovals(id)`, wiping both the resolver store and the visible queue. The permission-gate abort handler is kept as a defensive fallback for provider-side aborts that don't go through the registry. Also added a unit test proving the store recovers cleanly after overflow + clear (no zombie retention). 944 → 945 unit tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2cf5f3897e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (context.signal.aborted) settle('reject'); | ||
| else context.signal.addEventListener('abort', () => settle('reject'), { once: true }); |
There was a problem hiding this comment.
Remove aborted requests from batch approval queue
When batch mode is active, this abort handler only calls settle('reject') for the waiting promise, but it never removes the corresponding entry from the batch store or pendingApprovals. If a user runs /agent stop (abort without ending the session), stale requests remain queued, so later queue counts are inflated and repeated aborts can fill the 100-item cap, causing new tool approvals to be auto-rejected even though nothing is truly pending. The rejection path needs to dequeue this specific gate (or clear queue on abort) instead of only settling the promise.
Useful? React with 👍 / 👎.
| if (action === 'on') { | ||
| await updateSessionState(session.id, { | ||
| type: 'batch_approval_changed', | ||
| sessionId: session.id, |
There was a problem hiding this comment.
Reject batch enable when Claude permission hook is inactive
This branch enables batch mode for any Claude session, but batch queuing only works when createClaudePermissionHandler is actually installed; shouldUseClaudePermissionHandler disables that hook in mode === 'auto' (and when effective Claude permission mode is bypass). In those configurations, /agent batch action:on reports success even though tool calls are not deferred at all, which can mislead users into believing executions are queued for approval when they are not. Add a guard here that refuses action:on unless the current session’s effective Claude permission mode is gate-enabled.
Useful? React with 👍 / 👎.
|
Codex review P1 (zombie entries on abort) 已修 — commit 960e436。abortSessionWithReason 与 endSession 现在都会 drainBatchApprovalStore + clearPendingApprovals。945 unit + 4 e2e 全绿。 |
Addresses 2 new P1 comments on commit 960e436: P1 — permission-gate abort handler must self-dequeue (discussion_r3099... , packages/engine/src/executor/permission-gate.ts:154) The abort listener previously only settled the waiting promise. If an abort came from a path that doesn't go through `abortSessionWithReason` (e.g. provider-side abort), the entry would remain in both the batch-approval-store and XState's pendingApprovals, eventually hitting the 100-item cap. Now the listener also calls: - removeBatchApproval(sessionId, gateId) — new single-gate removal - stateMachine.removePendingApproval(sessionId, gateId) — new XState action keeping the stores consistent even when cleanup comes from the provider rather than the registry. P1 — /agent batch action:on must verify the Claude permission hook is actually installed (packages/bot/src/command-handlers-modules/agent-handlers.ts) The Claude canUseTool handler (which is where batch queueing fires) is only installed when `shouldUseClaudePermissionHandler(session)` returns true — false for `mode === 'auto'` and for `claudePermissionMode === 'bypass'`. Enabling batch mode in those configs was misleading: the UI reported success but no tool calls were being deferred. Added a guard that rejects action:on with a mode-specific hint ("/agent mode mode:normal" or "/agent permissions claude-permissions:normal"). New tests: - removeBatchApproval drops one gate without touching others + cleans up the session key when the last entry is removed - removePendingApproval drops one gate in XState; missing gateId is a no-op - batch-approval e2e now creates the session in normal mode so the new guard lets action:on through 948 unit tests green; batch-approval e2e green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2 条新 P1 反馈已修 — commit 062396f。permission-gate abort handler 现在自 dequeue 单个 gate(防 provider 内部 abort 绕过 abortSession);/agent batch action:on 加 guard 检查 shouldUseClaudePermissionHandler(auto / bypass 模式拒绝并提示)。948 unit + 4 e2e 全绿。 |
Summary
0.2.85 → 0.2.112, Codex SDK0.117.0 → 0.121.0terminal_reason映射;Codex 从turn.failed错误文本 + abort 推断。SessionContextFields+ 3 个 XState 事件 + subscribe 单源同步 snapshot,状态卡渲染勾选列表。permission_denied统一事件:新 PlatformEvent 进 XState,ring buffer 保留最近 5 条拒绝,状态卡展示。/agent batch子命令(on/off/approve-all/reject-all)+batch-approval-store,pendingApprovals封顶 100。MONITOR_REASONING_EFFORTconfig。Codex 传modelReasoningEffort;Claude 在 high/xhigh 时切到MONITOR_CLAUDE_MODEL。SessionContextFields/StatusCardViewData,XState context / projection / transition metadata 共享同一 shape,消除 4 处字段重复。recordPermissionDenialhelper 合并两条拒绝路径,删除废话注释,内联过度抽象的 monitor helper。Test plan
pnpm typecheck通过pnpm lint通过pnpm test— 934 tests 全部通过(新增 2 个测试文件:batch-approval-store.test.ts、state-machine-context.test.ts)turn.failed含 rate limit 错误 → 验证状态行显示"触发速率限制"/agent batch action:on,触发工具调用 → 队列展示,/agent batch action:approve-all一次性放行🤖 Generated with Claude Code