Skip to content

feat(sdk): upgrade Claude/Codex SDKs + 5 cross-provider capabilities#17

Merged
xuhongbo merged 5 commits into
mainfrom
feat/sdk-upgrade-2026-04
Apr 17, 2026
Merged

feat(sdk): upgrade Claude/Codex SDKs + 5 cross-provider capabilities#17
xuhongbo merged 5 commits into
mainfrom
feat/sdk-upgrade-2026-04

Conversation

@xuhongbo

Copy link
Copy Markdown
Owner

Summary

  • SDK upgrade: Claude Agent SDK 0.2.85 → 0.2.112, Codex SDK 0.117.0 → 0.121.0
  • 5 new features,Claude 与 Codex 两端体验一致:
    1. TerminalReason 统一枚举(10 类别),状态行显示精准中文标签。Claude 从 SDK terminal_reason 映射;Codex 从 turn.failed 错误文本 + abort 推断。
    2. TodoList 深度集成进 XState:新 SessionContextFields + 3 个 XState 事件 + subscribe 单源同步 snapshot,状态卡渲染勾选列表。
    3. permission_denied 统一事件:新 PlatformEvent 进 XState,ring buffer 保留最近 5 条拒绝,状态卡展示。
    4. 批量审批 / 延后队列:新 /agent batch 子命令(on/off/approve-all/reject-all)+ batch-approval-storependingApprovals 封顶 100。
    5. Monitor 推理强度MONITOR_REASONING_EFFORT config。Codex 传 modelReasoningEffort;Claude 在 high/xhigh 时切到 MONITOR_CLAUDE_MODEL
  • 类型收敛:抽出 SessionContextFields / StatusCardViewData,XState context / projection / transition metadata 共享同一 shape,消除 4 处字段重复。
  • Simplify pass:提取 recordPermissionDenial helper 合并两条拒绝路径,删除废话注释,内联过度抽象的 monitor helper。

Test plan

  • pnpm typecheck 通过
  • pnpm lint 通过
  • pnpm test — 934 tests 全部通过(新增 2 个测试文件:batch-approval-store.test.tsstate-machine-context.test.ts
  • Claude 会话:触发 monitor pass,验证 xhigh 切模型 + 结束原因展示
  • Codex 会话:turn.failed 含 rate limit 错误 → 验证状态行显示"触发速率限制"
  • 两端都调用 TodoWrite / todo_list → 状态卡勾选框更新
  • 两端 /agent batch action:on,触发工具调用 → 队列展示,/agent batch action:approve-all 一次性放行
  • 两端拒绝工具后 → 状态卡"最近拒绝"区出现 ⛔ 条目

🤖 Generated with Claude Code

…ities

Claude Agent SDK 0.2.85 → 0.2.112, Codex SDK 0.117.0 → 0.121.0. Both
providers now expose the same feature surface in Discord.

Features (両端対等):

1. TerminalReason — unified enum across providers, surfaced in status line
   - Claude: map SDK terminal_reason
   - Codex: infer from turn.failed / abort

2. TodoList rendering — deep XState integration
   - New SessionContextFields (todoList, todoUpdatedAt, etc.)
   - 3 new XState events + subscribe-mirrored snapshot
   - Status card renders checkbox list for both providers

3. permission_denied — unified event + 5-item ring buffer
   - New PlatformEvent routed through XState
   - Recorded for both canUseTool and future SDK-hook denials

4. Batch approval / defer queue — /agent batch subcommand
   - batch-approval-store with per-session Promise resolvers
   - pendingApprovals capped at 100 to prevent unbounded growth
   - Status card shows queue + denial history

5. Monitor reasoning effort — MONITOR_REASONING_EFFORT config
   - Codex: modelReasoningEffort on monitor thread
   - Claude: upgrade to MONITOR_CLAUDE_MODEL when effort ≥ high

Also: consolidate duplicated type declarations via SessionContextFields
and StatusCardViewData; extract recordPermissionDenial helper to dedupe
the two denial paths.

934 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b75229afba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +106 to +110
const batchAction = await new Promise<'approve' | 'reject'>((resolve) => {
enqueueBatchApproval(liveSession.id, {
gateId,
toolUseID: context.toolUseID,
toolName: displayName,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cancel pending batch approvals on abort/end

This await new Promise(...) has no timeout or abort path, so a deferred tool check can remain unresolved forever if the session is stopped/ended before /agent batch approve-all|reject-all is run. Unlike the normal gate flow (which is settled by gateCoordinator), this branch never self-settles, so batch requests can leak and leave the run stuck waiting on canUseTool. Wire this promise to cancellation (e.g., context.signal) and/or ensure session cleanup drains batch-approval-store.

Useful? React with 👍 / 👎.

Comment on lines +227 to +231
await updateSessionState(ctx.sessionId, {
type: 'todo_updated',
sessionId: ctx.sessionId,
source: providerSource(ctx.sessionId),
confidence: 'high',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Force rerender for context-only state updates

These new todo_updated updates go through updateSessionState, but panel rendering currently only triggers when state/turn/phase changes (see panel-adapter.ts stateChanged check). todo_updated and permission_denied only mutate context fields, so the status card often stays stale until an unrelated lifecycle transition happens. This breaks the new “实时 Todo/拒绝记录” behavior; context changes from these events need to be treated as render-worthy.

Useful? React with 👍 / 👎.

… test

Discord 官方没有测试套件(已查证 docs.discord.com、discord-api-docs#3355、
discord.js#6179、npm discord.js.test),社区也没有活跃的成熟方案。业内公认
做法是自建测试 guild + 独立 bot token + 隔离数据层。本提交落地该路径并
再进一步:事件层注入触发 + 真实 Discord fetch 断言,兼顾速度与保真度。

架构:

- scripts/e2e/harness/  共享基础设施
  - harness.ts          login + scratch category/channel + createSession +
                        tag-based cleanup sweeper + onFail snapshot
  - assertions.ts       waitForEmbedField / waitForMessageMatching 带轮询
  - fake-interaction.ts 最小 ChatInputCommandInteraction / ButtonInteraction
                        shape,channel 字段注入真实 TextChannel → handler
                        内部 reply/send 的副作用落到真实 Discord
  - cleanup-sweeper.ts  按 channel topic `wsc-e2e:<runId>:<ttl>` 回收僵尸
  - artifacts.ts        失败时 dump embeds.json + state-machine.json

- scripts/e2e/features/  4 个 *.e2e.ts,每个约 100-150 行
  - terminal-reason    Claude+Codex × 4 reason (max_turns / rate_limited
                       / context_too_long / aborted)
  - todo-list          Claude+Codex,待办 embed field + 清空收回
  - permission-denied  Claude+Codex,digest ⛔ 消息 + 状态卡 field +
                       ring buffer cap 5
  - batch-approval     /agent batch on/approve-all/reject-all/off 4 分支

- scripts/e2e/index.ts   串行 runner,--only=<name> 单跑支持

- packages/engine/__tests__/provider-runtime-monitor.test.ts
  Monitor effort 9 个单元测试(Discord 侧不可见,用 unit 覆盖 high/xhigh/
  medium/empty × Claude/Codex 分支)

E2E 过程中真的抓到并修复 3 个产品 bug:

1. TerminalReason 在失败路径被 sendTurnFailure 吞掉,embed 从不展示结束
   原因。修:panel-adapter 在 failure path 前置 `(原因)\n` 前缀。

2. panel-adapter 的 stateChanged 检查只看 state/turn/phase,遗漏新的
   context 字段(todoList / denials / batchApprovalMode / pendingApprovals),
   导致 XState context 变更不触发状态卡重绘。修:加 contextChanged 检查。

3. /agent batch 直接改 stateMachine 不通知 panel 重绘;XState
   applyBatchApprovalSet 复用 event.pendingApprovals 的 ref 导致
   contextChanged 引用比较失效。修:handleAgentBatch 改走
   updateSessionState 统一路径;applyBatchApprovalSet 强制 copy array。

package.json 新增 test:discord:e2e;local-acceptance-suite 追加
discord-features-e2e 入口。

验证:typecheck + lint + 943 unit tests 全绿;e2e 两次连跑全绿,总时长
约 36s(boot 10s + 4 features 并发执行真实 Discord fetch 断言)。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xuhongbo

Copy link
Copy Markdown
Owner Author

E2E 测试 + 产品修复(commit dea9498

调研结论

Discord 官方没有测试套件(核对:docs.discord.com/developers、discord-api-docs#3355、discord.js#6179、npm discord.js.test 5 年前废弃)。社区工具 Corde 维护平平,无 mock-gateway 方案。业内公认做法:自建测试 guild + 独立 bot token + 隔离数据层。本提交落地该路径 + 事件层注入触发 + 真实 Discord fetch 断言。

新增

  • scripts/e2e/harness/ — 5 个共享基础设施文件(harness / assertions / fake-interaction / cleanup-sweeper / artifacts)
  • scripts/e2e/features/ — 4 个 *.e2e.ts(terminal-reason / todo-list / permission-denied / batch-approval)
  • scripts/e2e/index.ts — 串行 runner,支持 --only=<name>
  • packages/engine/__tests__/provider-runtime-monitor.test.ts — Monitor effort 9 个单元测试(Discord 侧不可见)
  • pnpm test:discord:e2e + local-acceptance-suite.ts 入口接入

E2E 抓到并修的 3 个产品 bug

  1. TerminalReason 在失败路径被 sendTurnFailure 吞掉,embed 从不显示结束原因
  2. panel-adapterstateChanged 检查漏掉新 context 字段,XState context 变更不触发状态卡重绘
  3. /agent batch 直接改 stateMachine 不通知 panel;applyBatchApprovalSet 复用 event ref 导致 contextChanged 引用比较失效

验证

  • pnpm typecheck + pnpm lint + 943 unit tests 全绿
  • pnpm test:discord:e2e 两次连跑全绿,总时长 ~36s
  • Claude + Codex 两端都跑通

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dea9498a02

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +25 to +27
export function enqueueBatchApproval(sessionId: string, entry: BatchApprovalEntry): void {
if (!store.has(sessionId)) store.set(sessionId, []);
store.get(sessionId)!.push(entry);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cap deferred approval store to visible queue size

pendingApprovals is capped at 100 in state (slice(-MAX_PENDING_APPROVALS)), but this backing store is unbounded, so once more than 100 tool requests are deferred, older entries disappear from the status card while still remaining actionable in approve-all/reject-all. In that scenario, users can approve unseen tool calls, which is a safety regression for batch approval mode.

Useful? React with 👍 / 👎.

Comment on lines +200 to +204
const MAX_LINES = 8;
const MAX_CHAR = 180;
const lines: string[] = [];
for (const item of items.slice(0, MAX_LINES)) {
const mark = item.completed ? '☑' : '☐';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep todo field under Discord embed value limit

This renderer can generate up to ~8×180 characters (plus markers/newlines), which exceeds Discord’s 1024-character embed field value limit. When long todo items are returned, statusCard.update will fail on message edit/send and the status card stops reflecting new state. The todo rendering needs a total-length budget, not only per-line truncation.

Useful? React with 👍 / 👎.

Comment on lines +514 to +518
if (action === 'on') {
await updateSessionState(session.id, {
type: 'batch_approval_changed',
sessionId: session.id,
source,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject /agent batch for Codex sessions

This command enables batch mode for any provider, but deferred approvals are only enqueued in the Claude permission handler path. Because Codex sessions do not use that path, action:on reports success yet no Codex approvals are actually deferred, leaving users in a misleading “enabled” state with no effect. Add a provider guard here (or wire an actual Codex implementation) so behavior matches the command response.

Useful? React with 👍 / 👎.

Addresses 4 actionable inline comments from chatgpt-codex-connector bot on
PR #17 (1 comment on stateChanged check was already fixed in prior commit).

P1 — Cap batch-approval-store at MAX_BATCH_APPROVAL_STORE_SIZE (100):
  state.pendingApprovals already caps at 100, but the backing store was
  unbounded. Over 100 deferred requests would disappear from the status
  card while remaining actionable via approve-all — silently approving
  tool calls the user never saw. enqueueBatchApproval now returns
  'enqueued' | 'overflow'; permission-gate auto-rejects on overflow so
  the SDK turn unblocks instead of queuing invisibly.

P1 — Cancel pending batch approvals on session abort/end:
  The `await new Promise(...)` in permission-gate's batch branch had no
  cancellation path. Session stop/shutdown would leak the promise and
  hang the SDK turn forever. Now:
  - `context.signal.addEventListener('abort', ...)` resolves 'reject'
    when the provider's AbortController fires
  - endSession() calls clearBatchApprovalStore() so any still-pending
    entries drain to reject during cleanup

P2 — Keep todo field under Discord's 1024-char embed value limit:
  8 lines × 180 chars + markers could exceed 1024 → PATCH message would
  400. Added a 950-char TOTAL_BUDGET that stops adding lines (and the
  "… 还有 N 项" suffix) once the budget is exhausted.

P2 — Reject /agent batch for Codex sessions:
  Batch mode only works in the Claude canUseTool path. Codex uses
  approvalPolicy on its own thread and never touches the batch store,
  so `action:on` previously reported success but deferred nothing. Now
  handleAgentBatch returns early with a clear message pointing users
  to `/agent permissions codex-approval` for Codex sessions.

Also raised e2e assertion timeout default from 10s → 15s to absorb
slow-network Discord REST latency (first-run boot hit 38s on weak link).

Verification:
- typecheck + lint + 944 unit tests green (added overflow test case)
- pnpm test:discord:e2e — 4 features pass, 26s total

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xuhongbo

Copy link
Copy Markdown
Owner Author

Codex review 4 条反馈已修 — commit 2cf5f38。944 unit + 4 e2e 全绿。

@xuhongbo

Copy link
Copy Markdown
Owner Author

@codex code review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2cf5f3897e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +153 to +154
if (context.signal.aborted) settle('reject');
else context.signal.addEventListener('abort', () => settle('reject'), { once: true });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove aborted entries from batch approval queues

When context.signal aborts (e.g. /agent stop), this path only calls settle('reject') for the waiting promise, but it never removes that request from batch-approval-store or pendingApprovals state. Because /agent stop aborts without calling endSession, these stale entries can accumulate across runs, inflate approve-all/reject-all counts, and eventually hit MAX_BATCH_APPROVAL_STORE_SIZE, causing new approvals to be auto-rejected as overflow even though no real pending approvals remain.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a13c91fda.

Addresses Codex review P1 on commit 2cf5f38 (discussion_r3099408511):

Previously the batch-approval abort path only called settle('reject') for
the waiting canUseTool promise, but left the entry in
`batch-approval-store` and in XState's `pendingApprovals`. Since
`/agent stop` calls `abortSessionWithReason` (not `endSession`), these
zombie entries persist across stop/restart cycles — inflating
approve-all/reject-all counts and eventually hitting the 100-item
overflow cap, at which point legitimate new approvals get auto-rejected
because the queue is full of dead entries.

Fix: both `abortSessionWithReason` and `endSession` now call
`clearBatchApprovalStore(id)` and `stateMachine.clearPendingApprovals(id)`,
wiping both the resolver store and the visible queue. The
permission-gate abort handler is kept as a defensive fallback for
provider-side aborts that don't go through the registry.

Also added a unit test proving the store recovers cleanly after
overflow + clear (no zombie retention).

944 → 945 unit tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2cf5f3897e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +153 to +154
if (context.signal.aborted) settle('reject');
else context.signal.addEventListener('abort', () => settle('reject'), { once: true });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove aborted requests from batch approval queue

When batch mode is active, this abort handler only calls settle('reject') for the waiting promise, but it never removes the corresponding entry from the batch store or pendingApprovals. If a user runs /agent stop (abort without ending the session), stale requests remain queued, so later queue counts are inflated and repeated aborts can fill the 100-item cap, causing new tool approvals to be auto-rejected even though nothing is truly pending. The rejection path needs to dequeue this specific gate (or clear queue on abort) instead of only settling the promise.

Useful? React with 👍 / 👎.

Comment on lines +524 to +527
if (action === 'on') {
await updateSessionState(session.id, {
type: 'batch_approval_changed',
sessionId: session.id,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject batch enable when Claude permission hook is inactive

This branch enables batch mode for any Claude session, but batch queuing only works when createClaudePermissionHandler is actually installed; shouldUseClaudePermissionHandler disables that hook in mode === 'auto' (and when effective Claude permission mode is bypass). In those configurations, /agent batch action:on reports success even though tool calls are not deferred at all, which can mislead users into believing executions are queued for approval when they are not. Add a guard here that refuses action:on unless the current session’s effective Claude permission mode is gate-enabled.

Useful? React with 👍 / 👎.

@xuhongbo

Copy link
Copy Markdown
Owner Author

Codex review P1 (zombie entries on abort) 已修 — commit 960e436。abortSessionWithReason 与 endSession 现在都会 drainBatchApprovalStore + clearPendingApprovals。945 unit + 4 e2e 全绿。

Addresses 2 new P1 comments on commit 960e436:

P1 — permission-gate abort handler must self-dequeue
(discussion_r3099... , packages/engine/src/executor/permission-gate.ts:154)

The abort listener previously only settled the waiting promise. If an
abort came from a path that doesn't go through `abortSessionWithReason`
(e.g. provider-side abort), the entry would remain in both the
batch-approval-store and XState's pendingApprovals, eventually hitting
the 100-item cap. Now the listener also calls:
  - removeBatchApproval(sessionId, gateId)   — new single-gate removal
  - stateMachine.removePendingApproval(sessionId, gateId) — new XState action
keeping the stores consistent even when cleanup comes from the provider
rather than the registry.

P1 — /agent batch action:on must verify the Claude permission hook is
actually installed (packages/bot/src/command-handlers-modules/agent-handlers.ts)

The Claude canUseTool handler (which is where batch queueing fires) is
only installed when `shouldUseClaudePermissionHandler(session)` returns
true — false for `mode === 'auto'` and for `claudePermissionMode ===
'bypass'`. Enabling batch mode in those configs was misleading: the UI
reported success but no tool calls were being deferred. Added a guard
that rejects action:on with a mode-specific hint ("/agent mode
mode:normal" or "/agent permissions claude-permissions:normal").

New tests:
  - removeBatchApproval drops one gate without touching others + cleans
    up the session key when the last entry is removed
  - removePendingApproval drops one gate in XState; missing gateId is
    a no-op
  - batch-approval e2e now creates the session in normal mode so the
    new guard lets action:on through

948 unit tests green; batch-approval e2e green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xuhongbo

Copy link
Copy Markdown
Owner Author

2 条新 P1 反馈已修 — commit 062396f。permission-gate abort handler 现在自 dequeue 单个 gate(防 provider 内部 abort 绕过 abortSession);/agent batch action:on 加 guard 检查 shouldUseClaudePermissionHandler(auto / bypass 模式拒绝并提示)。948 unit + 4 e2e 全绿。

@xuhongbo xuhongbo merged commit 70401ad into main Apr 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants