Skip to content

fix(codex): rebuild traces from transcript#69

Open
rangemer333-cell wants to merge 3 commits into
mainfrom
yunshen/codex-aborted-turn-recovery
Open

fix(codex): rebuild traces from transcript#69
rangemer333-cell wants to merge 3 commits into
mainfrom
yunshen/codex-aborted-turn-recovery

Conversation

@rangemer333-cell

Copy link
Copy Markdown
Collaborator

Summary

  • make rollout transcripts the single Codex telemetry source, with Stop retained only as a wakeup signal
  • reconstruct completed and interrupted turns from original transcript ordering, including LLM, tool, token, and message fields
  • emit standard full request context in user -> assistant.tool_call -> tool_call_response order

Verification

  • npm test -- tests/unit/inputs/codex-transcript/codex-transcript-input.test.ts tests/unit/inputs/codex-aborted-turn/codex-aborted-turn-input.test.ts tests/unit/hooks/codex/hook-processor.test.mjs tests/unit/hooks/codex/transcript-parser.test.mjs tests/unit/flushers/otlp-trace-flusher/turn-boundary.test.ts tests/integration/codex-aborted-turn-converter.test.ts
  • npm run typecheck
  • npm run build
  • git diff --check

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Major architectural improvement: replaces the old codex-log input (which relied on OTel plugin JSONL and accumulated hook events) with a new codex-transcript input that reads directly from Codex rollout transcripts as the single source of truth. The codex-hook-processor.mjs is greatly simplified from 657 lines to 65 lines — Stop is now just a best-effort wakeup marker, eliminating fragile transcript-flush timing dependencies.

Overall verdict: solid design, well-executed. The new architecture is cleaner, safer (hooks can no longer disrupt agent operation by failing mid-transcript-parse), and more robust (the transcript tailer picks up changes via polling even if a wakeup marker is missed). Comprehensive test coverage across hook processor, transcript parser, both input modules, integration converter, and trace flusher turn-boundary logic. Good documentation including the aborted-turn recovery design doc.

CLA status could not be verified (no CLA context found on the head commit). Posting as COMMENT — please ensure CLA is signed before merge.

Findings

See inline comments below. Summary of severity:

  • [Warning] 1 — silent error swallowing in wakeup marker
  • [Info] 3 — sort stability, config migration, no-op hook documentation

Suggestions

  • Add stderr logging to the writeWakeupMarker catch block for debuggability.
  • Add a backward-compatible config fallback for codex-log -> codex-transcript migration.
  • Consider a secondary sort key in compareTimelineEvents for deterministic ordering.

Automated review by github-manager-bot

Comment thread assets/hooks/codex-hook-processor.mjs Outdated
fs.mkdirSync(directory, { recursive: true });
fs.writeFileSync(temporary, JSON.stringify(payload), 'utf8');
fs.renameSync(temporary, marker);
} catch {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Warning] writeWakeupMarker silently swallows all errors with catch {}. While best-effort is appropriate for a hook (it must never disrupt Codex), consider logging to stderr for debuggability — a failed wakeup marker delays telemetry export until the next poll interval, and without any log output this failure mode would be invisible.

Comment thread assets/hooks/codex-hook-processor.mjs Outdated
process.stdout.write('{}\n');
function main() {
const subcommand = (process.argv[2] || '').trim();
if (subcommand !== 'stop') return;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] main() silently returns for all subcommands except stop. The old processor handled session-start, user-prompt-submit, pre-tool-use, post-tool-use. If agents.d/codex.json still dispatches these subcommands, they are now no-ops. This is intentional (transcript is the source of truth), but a brief comment here or in codex.json would help future maintainers understand why these hooks fire without doing anything.

...(turn.cwd ? { 'agent.codex.cwd': turn.cwd } : {}),
};
const records: AgentActivityEntry[] = [];
const timeline = [...turn.timeline].sort(compareTimelineEvents);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] [...turn.timeline].sort(compareTimelineEvents) — JavaScript's Array.prototype.sort is not guaranteed stable across all engines (though V8/Node ≥ 11 is stable). If two timeline events share the same timestamp, their relative order could vary on non-V8 engines. Consider adding a secondary sort key (e.g., original array index) for deterministic ordering.

Comment thread src/core/orchestrator.ts
'cursor-hook': 'cursor',
'claude-code-log': 'claude-code',
'codex-log': 'codex',
'codex-transcript': 'codex',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] Config key changes from codex-log to codex-transcript. Existing users with codex-log listener config entries (pollInterval, enabled) will silently lose their customization. Consider adding a backward-compatible fallback that reads codex-log config if codex-transcript is absent, or a one-time deprecation warning.

@ralf0131

Copy link
Copy Markdown
Collaborator

⚠️ Merge conflict detected

This PR has conflicts with the main branch and cannot be merged. Please rebase or merge main into your branch and resolve the conflicts:

git fetch origin
git checkout yunshen/codex-aborted-turn-recovery
git rebase origin/main
# resolve conflicts, then:
git push --force-with-lease

This is a one-time reminder. Feel free to @mention me for a re-review after conflicts are resolved.


Automated notification by github-manager-bot

@linrunqi08 linrunqi08 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #69 Codex Telemetry: Hook → Transcript Migration

Reviewed the full 5700-line diff across 38 changed files. Overall this is a well-structured migration from hook-based to transcript-based telemetry collection. Solid architecture with the wakeup-marker pattern, deterministic IDs via SHA-256, and the new BaseInput.cyclePromise serialization. Below are findings ranked by severity.


1. 🔴 Massive code duplication between codex-transcript and codex-aborted-turn modules

Files: src/inputs/codex-transcript/ and src/inputs/codex-aborted-turn/

The following functions are duplicated near-identically across both modules:

  • asRecord, stringValue, timestampMs — both modules define local copies
  • toJsonValue, sameUsage — extractor files both define these
  • readJsonLines, readJsonLineAt — both input files have independent implementations
  • sessionIdFromTranscriptPath, hashId, buildEntry — builders and extractors duplicate these
  • collectRolloutFiles / discoverSessionFiles — both inputs walk the same ~/.codex/sessions directory

That's 400+ lines of duplicated utility code that must be kept in sync manually. The codex-aborted-turn-utils.ts file exists as a shared utilities module but only exports 3 of these functions — and even those aren't consumed by the transcript module.

Suggestion: Extract shared utilities into a codex-shared/ module (e.g. codex-shared/transcript-utils.ts and codex-shared/io-utils.ts).


2. 🟠 timestampMs inconsistency: returns 0 vs undefined on parse failure

Files:

  • src/inputs/codex-transcript/codex-transcript-input.ts line 432 — returns Date.now() on failure
  • src/inputs/codex-transcript/codex-transcript-extractor.ts line 45 — initializes startedAtMs = 0 (implicit "no timestamp")
  • src/inputs/codex-aborted-turn/codex-aborted-turn-utils.ts line 11 — returns undefined on failure

Three different failure modes for the same semantic operation:

  • Transcript input: falls back to Date.now() (never produces a sentinel)
  • Transcript extractor: uses 0 as a sentinel → timestampToUnixNanos(0) produces '0' (Unix epoch 1970-01-01)
  • Aborted-turn utils: returns undefined → callers do explicit null-checks

A transcript record with a missing or malformed timestamp silently gets epoch-0 timestamps in the output telemetry, which may confuse downstream consumers or create silent data quality issues.


3. 🟠 Double cycle-serialization guard — collecting promise redundant with BaseInput.cyclePromise

Files:

  • src/inputs/codex-transcript/codex-transcript-input.ts lines 95-100
  • src/inputs/codex-aborted-turn/codex-aborted-turn-input.ts lines 73-78

Both inputs override collect() with:

if (this.collecting) return this.collecting;
this.collecting = this.collectOnce().finally(() => {
  this.collecting = null;
});
return this.collecting;

This is identical in purpose to BaseInput.runCycle() which now has:

if (this.cyclePromise) return this.cyclePromise;
this.cyclePromise = this.runCycleOnce().finally(() => {
  this.cyclePromise = null;
});

The outer cyclePromise guard already prevents concurrent cycles, making the inner collecting guard dead code. It's not harmful but adds confusion about which guard is load-bearing.


4. 🟠 readJsonLines allocates full-range buffer for entire unscanned range

Files:

  • src/inputs/codex-transcript/codex-transcript-input.ts line 356
  • src/inputs/codex-aborted-turn/codex-aborted-turn-input.ts (same pattern)
const buffer = Buffer.alloc(endOffset - startOffset);

During baselineFile() on a transcript that hasn't been scanned yet, startOffset = 0 and endOffset = stat.size. A Codex session with extensive tool use can produce transcripts of tens of MB. This allocates the entire file as a single buffer every 30 seconds per-transcript until fully consumed.

Consider chunked reading (e.g. 1MB at a time) to bound memory usage, especially since multiple transcripts may be processed in the same cycle.


5. 🟡 Hook processor silently swallows all errors and drops stdout response

File: assets/hooks/codex-hook-processor.mjs lines 46-51

try {
  fs.mkdirSync(directory, { recursive: true });
  fs.writeFileSync(temporary, JSON.stringify(payload), 'utf8');
  fs.renameSync(temporary, marker);
} catch {
  try { fs.unlinkSync(temporary); } catch {}
}

Two issues:

  1. Silent error swallowing: The old code called logHookError() on failures; the new code catches all errors (disk full, permissions, path too long) without any logging. If the wakeup directory is not writable, the Stop hook silently fails and the transcript tailer must wait for the next poll cycle (up to 30s delay) instead of waking immediately.
  2. Dropped stdout response: The old code wrote process.stdout.write('{}\n') on every exit path. The new code writes nothing to stdout. If Codex expects a JSON response on stdout (some hook protocols do), this could cause the hook to be treated as failed by the caller.

6. 🟡 Both inputs independently scan ~/.codex/sessions if both are registered

Files:

  • src/inputs/codex-transcript/codex-transcript-input.ts — registered in orchestrator
  • src/inputs/codex-aborted-turn/codex-aborted-turn-input.ts — exported from src/index.ts

Both inputs:

  • Call discoverSessionFiles() to walk the same ~/.codex/sessions directory
  • Both handle turn_aborted events from the same transcripts
  • CodexTranscriptInput is registered in the orchestrator; CodexAbortedTurnInput is exported but not registered

If an external consumer registers both inputs (possible since both are exported), every aborted turn would produce duplicate telemetry entries. Consider documenting the mutual-exclusion constraint, or adding a runtime guard.

@rangemer333-cell rangemer333-cell force-pushed the yunshen/codex-aborted-turn-recovery branch from aad8877 to a4dc50d Compare June 25, 2026 02:27
@rangemer333-cell rangemer333-cell force-pushed the yunshen/codex-aborted-turn-recovery branch from a4dc50d to 09d4e21 Compare June 25, 2026 02:37

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Re-review after new commits. The latest commit (fix(codex): harden transcript retry and hook cleanup) is clean and well-tested:

  • The scan-offset advancement was fixed so a terminal turn that must be retried is no longer skipped on the next poll — it now uses line.endOffset of the last processed turn instead of lines.nextOffset (correctness fix).
  • Hook uninstall now removes legacy replaceHookCommands and deletes the array key when it becomes empty (good cleanup + migration support).

The earlier catch {} in writeWakeupMarker now attempts temp-file cleanup before swallowing, which addresses the resource-leak aspect of the previous warning. CI is green. Good hardening pass.

Notes

  • The catch {} still does not log (silent swallow). For a hook that must never disrupt Codex this is a defensible design choice; if you want debuggability, a single logger.debug line would help triage wakeups that silently no-op.
  • The codex-transcript refactor (commit 09d4e21) is substantial (~1200 lines across builder/extractor/input). CI passes and the parts reviewed look correct; a maintainer may want a deeper pass on the transcript ordering/recovery logic before merge.

Automated review by github-manager-bot

@linrunqi08 linrunqi08 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Update — 第二次检查(commit a0641f6 之后)

感谢第二个提交的修复工作,将 codex-aborted-turn 模块合并进了 codex-transcript,消除了大部分重复代码和重复扫描问题。以下是 6 个原始发现的当前状态:


✅ 已修复

Finding #1 (部分) — 代码大量重复
codex-aborted-turn 的 builder/extractor/input/types/utils 5 个独立文件已从 PR 移除,aborted turn 处理合并到 codex-transcript 模块中。大幅消除了重复代码。

Finding #6 — 两个 input 独立扫描同一目录导致重复遥测
CodexAbortedTurnInput 已从 orchestrator 移除,仅 CodexTranscriptInput 处理所有场景(含 turn_aborted),重复遥测风险消除。


⚠️ 仍未修复

Finding #2timestampMs 不一致:返回 0 vs Date.now()

仍然有两个 timestampMs 实现,失败行为不同:

  • codex-transcript-extractor.ts — 解析失败返回 0
    return Number.isFinite(parsed) ? parsed : 0;
  • codex-transcript-input.ts — 解析失败返回 Date.now()
    return Number.isFinite(parsed) ? parsed : Date.now();

同一模块内两个同名函数对 parse 失败的语义不同:0 会导致 timestampToUnixNanos(0) 输出 Unix 纪元(1970-01-01),Date.now() 则用当前时间兜底。建议统一策略,或者至少合并为一个带参数的函数(如 timestampMs(record, fallback))。


Finding #3 — 双重 cycle 序列化守卫

CodexTranscriptInput.collect() 仍保留 this.collecting 守卫:

// codex-transcript-input.ts
protected override async collect(): Promise<AgentActivityEntry[]> {
  if (this.collecting) return this.collecting;
  this.collecting = this.collectOnce().finally(() => { this.collecting = null; });
  return this.collecting;
}

这与 BaseInput.runCycle() 中新增的 cyclePromise 守卫功能完全重复:

// base-input.ts
private runCycle(): Promise<void> {
  if (this.cyclePromise) return this.cyclePromise;
  this.cyclePromise = this.runCycleOnce().finally(() => { this.cyclePromise = null; });
  return this.cyclePromise;
}

两层守卫不会导致 bug,但会让读者困惑哪层是真正生效的。建议移除子类的 collecting 守卫,依赖 BaseInput.cyclePromise


Finding #4readJsonLines 一次性分配全文件大小的 Buffer

// codex-transcript-input.ts
const buffer = Buffer.alloc(endOffset - startOffset);

baselineFile() 首次扫描时 startOffset = 0endOffset = stat.size,整个 transcript 文件被一次性读入内存。长时间运行的 Codex session 可能产生几十 MB 的 transcript,每个 poll 周期(30s)对未消费完的文件都会触发这种分配。建议分块读取(如每次 1MB),或在 baseline 时直接 seek 到文件末尾只记录 offset 而不解析历史内容。


Finding #5 — Hook processor 静默吞错 + 丢弃 stdout 响应

writeWakeupMarker 仍然无日志地吞掉所有错误:

} catch {
  try { fs.unlinkSync(temporary); } catch {}
}

旧代码在每个 catch 分支都调用 logHookError() 记录错误,新代码完全静默。如果 wakeup 目录不可写(权限/磁盘满),唯一的症状是遥测延迟从即时变为 30 秒,难以排查。

同时 main() 不再向 stdout 写入 {}\n。如果 Codex hook 协议期望 stdout JSON 响应,这可能导致 hook 被认为执行失败。

@rangemer333-cell

Copy link
Copy Markdown
Collaborator Author

timestampMs 统一
新增 codex-transcript-utils.ts,input/extractor 共用 timestampMs(record, fallback),解析失败统一显式传 fallback,避免 extractor 走 0 导致 1970 时间。

移除子类重复 cycle 守卫
删除 CodexTranscriptInput.collecting,只依赖 BaseInput.cyclePromise 串行化采集周期。

readJsonLines 改为分块读取
不再 Buffer.alloc(endOffset - startOffset) 一次性分配整个 transcript,而是 1MB chunk 流式扫描完整 JSONL 行,保留未完成行不消费。

Codex hook 恢复 fail-open 诊断与 stdout ack
codex-hook-processor.mjs 现在:
stdin parse / wakeup write / cleanup 失败会写 logs/codex/errors
stop 和非 stop 都输出 {},避免 Codex hook 协议误判失败

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Re-review after commit 2bcf21eb ("fix(codex): harden transcript parsing and wakeup hook"). This commit directly resolves the two findings from the previous review round and adds solid hardening:

  • [Resolved] Silent catch {} in writeWakeupMarker — now uses logHookError() for stdin-parse, wakeup-write, and cleanup errors, providing debuggability without risking hook stability.
  • [Resolved] Non-stop subcommands produce no stdoutmain() now wraps in try/finally and always writes {} to stdout, ensuring the agent receives acknowledgment regardless of subcommand.
  • Chunked file readingreadJsonLines now reads in 1 MB chunks with a pending buffer for cross-chunk line boundaries, replacing the previous single-buffer allocation that could cause memory pressure on large transcripts.
  • Unified timestampMs — moved to a shared codex-transcript-utils.ts with an explicit fallback parameter, eliminating the two divergent implementations (0 vs Date.now()).
  • Removed redundant collecting dedup — safe because BaseInput.runCycle() already serializes via cyclePromise, making the subclass-level guard unnecessary.
  • New tests cover stdout acknowledgment and error-logging paths.

CI is green across all build/test checks. CLA signed. No new findings on this commit.

Notes

  • The overall PR is substantial (~2000 additions across 20 files). The transcript ordering/recovery logic was reviewed in earlier rounds and looks correct; maintainers should do a final confidence pass before merge.

Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants