Skip to content

[otel] OpenTelemetry phase usage events と usage 分析を追加#785

Draft
j5ik2o wants to merge 10 commits into
mainfrom
codex/otel-usage-events-exporter-docs
Draft

[otel] OpenTelemetry phase usage events と usage 分析を追加#785
j5ik2o wants to merge 10 commits into
mainfrom
codex/otel-usage-events-exporter-docs

Conversation

@j5ik2o

@j5ik2o j5ik2o commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

概要

テスト

  • npm run build
  • npm run lint
  • npm test
  • npm test -- --run src/tests/analyze-usage-command.test.ts src/tests/usageEventsSpanProcessor.test.ts src/tests/phaseUsageEvent.test.ts src/tests/otelFoundation.test.ts src/tests/workflowSpans.test.ts
  • git diff --check

Closes #704
Closes #705

Summary by CodeRabbit

  • 新機能

    • オブザーバビリティをオプトインで追加。フェーズ単位の利用イベントをログ出力し、プロバイダ利用情報(トークン/キャッシュ含む)を収集・分析可能になりました。
    • CLIコマンドでフェーズ利用イベントを集計し、Markdown/CVS出力できます。
  • ドキュメント

    • 日英で観測機能の有効化手順、出力先、集計/分析手順を追記・更新。
  • テスト

    • E2Eおよび多数のユニットテストを追加・拡張し、出力形式と集計振る舞いを検証しました。

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: fad988f8-527b-4f01-9d45-96eec545fd28

📥 Commits

Reviewing files that changed from the base of the PR and between 811b3a7 and 4cf87e7.

📒 Files selected for processing (5)
  • e2e/specs/observability.e2e.ts
  • src/__tests__/otelFoundation.test.ts
  • src/__tests__/usageEventsSpanProcessor.test.ts
  • src/core/workflow/engine/ArpeggioRunner.ts
  • src/infra/observability/usageEventsSpanProcessor.ts

📝 Walkthrough

Walkthrough

このPRはオプトインのフェーズ粒度usageイベント出力を追加します。OpenTelemetryスパンからフェーズ使用量レコードを生成してJSONLへ追記するSpanProcessor、収集ログを集計するCLI、実行時の配線、ドキュメント、およびE2E/ユニットテストを含みます。

Changes

Phase usage events observability feature

Layer / File(s) Summary
Documentation and configuration updates
README.md, docs/README.ja.md, docs/configuration.*, docs/observability.*, docs/testing/e2e.md, e2e/specs/observability.e2e.ts
Observabilityガイド追加、設定フラグ(observability.enabled/usage_events_phase)の説明、出力先(.takt/runs/.../logs/<session>-usage-events.phase.jsonl)、既存logging.usage_eventsとの共存、解析手順を記載。
Phase usage event types and mapping
src/core/logging/contracts.ts, src/core/logging/phaseUsageEvent.ts
PHASE_USAGE_EVENTS_LOG_FILE_SUFFIXを定義。mapSpanEndToPhaseUsageEventはspan属性からphase/judge情報とusageを抽出し、missing正規化やcache関連トークンを含むレコードを組み立てる。
OpenTelemetry span processor and NDJSON append helper
src/infra/fs/jsonl.ts, src/infra/fs/index.ts, src/infra/observability/usageEventsSpanProcessor.ts
appendJsonLineでNDJSON追記。UsageEventsSpanProcessoronEndでphase usageレコードを生成してファイルへ追記。追記失敗は run ごとに一度だけログ出力し抑制する。
OtelFoundation integration and run-scoped exporter safety
src/infra/observability/otelFoundation.ts
runId 整合性検証、exporter 登録の例外時クリーンアップ、Shared SDK の span processor 状態拡張、usageEventsExporter オプション追加を実装。
Usage payload refactoring and cache token fields
src/core/logging/providerEvent.ts
buildUsageEventPayloadを分離し、UsageEventLogRecord.usagecache_creation_input_tokens/cache_read_input_tokensを追加。
ProviderUsage threading through workflow execution
src/core/workflow/arpeggio/types.ts, src/core/workflow/engine/*, src/core/workflow/observability/workflowSpans.ts, src/core/workflow/types.ts
BatchResultAgentResponseproviderUsageを伝搬。フェーズ/ジャッジのspan属性とエラー時outcomeにusage情報を反映する。usageAttributes()gen_ai.usage.*またはtakt.usage.*属性を生成。
Judge stage logging refactor with providerUsage
src/agents/judge-status-usecase.ts, src/agents/structured-caller/prompt-based-structured-caller.ts, src/core/workflow/status-judgment-phase.ts
JudgeStageLogEntry導入、createJudgeStageRecorder()でステージ3の収集を状態的に管理し、onJudgeStage/onJudgeResponseproviderUsageを渡すよう変更。
Claude provider usage extraction and integration
src/infra/claude/usage.ts, src/infra/claude/executor.ts, src/infra/claude-headless/*
Claude の raw usage から ProviderUsageSnapshot を抽出する extractClaudeProviderUsage を実装し、executor と stream 集約の戻り値に providerUsage を追加。
Usage analysis CLI command and scripts
src/commands/analyze-usage.ts, package.json
analyze:usage スクリプトと CLI を追加。フェーズusage JSONL を解決・読み取りし step/phase/provider/model で集計、runs/calls/missing やトークン統計(合計/平均/中央値/標準偏差)を算出し Markdown または CSV で出力。パースエラーは行番号を含む。
Provider type safety refactoring
src/shared/types/provider.ts
PROVIDER_TYPES配列由来の型に変更し、isProviderType は Set チェックで型ガードを実装。
E2E test suite for observability outputs
e2e/specs/observability.e2e.ts, vitest.config.e2e.mock.ts
isolated 環境で observability 有効設定の下、takt 実行後に *-usage-events.phase.jsonl/*-otel-session-shadow.jsonl/monitor.json の存在とフィールドを検証する E2E を追加。
Unit test coverage for phase events and analysis
src/__tests__/*
phase マッピング、usage missing 正規化、span processor の登録と書き込み挙動、OtelFoundation の exporter 検証、analyze-usage 集計/整形/エラー処理、キャッシュトークンのシリアライズ等を検証するテストを追加・更新。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • nrslib/takt#753: Both PRs modify the same observability initialization/wiring path in initializeOtelFoundation to register new OpenTelemetry SpanProcessors (main: UsageEventsSpanProcessor; retrieved: SessionLogSpanProcessor), so the changes are code-level related.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR タイトルは「[otel] OpenTelemetry phase usage events と usage 分析を追加」で、変更セットの主要な内容(phase-level usage events exporter、usage 分析コマンド、ドキュメント追加)を明確に要約しており、簡潔で具体的です。
Linked Issues check ✅ Passed PR は #704(opt-in phase 付き UsageEventsExporter)と #705(集計スクリプト + docs)に対応し、両者の要件を満たしています。UsageEventsSpanProcessor、phaseUsageEvent マッピング、analyze-usage コマンド、ドキュメント(observability.md/observability.ja.md)が実装され、既存の logging.usage_events は置き換えず維持されています。
Out of Scope Changes check ✅ Passed ほぼすべての変更が #704 および #705 の要件範囲内です。ProviderType の定義変更は typesafe な guard 関数導入で周辺整理、BatchResult への providerUsage フィールド追加はフェーズ実行追跡に必須、JudgeStageLogEntry の導入も judge stage logging の拡張で妥当性が高いです。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/otel-usage-events-exporter-docs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

e2e/specs/observability.e2e.ts

Parsing error: /e2e/specs/observability.e2e.ts was not found by the project service. Consider either including it in the tsconfig.json or including it in allowDefaultProject.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

✅ Created PR with unit tests: #786

@j5ik2o j5ik2o changed the title OpenTelemetry phase usage events と usage 分析を追加 [otel] OpenTelemetry phase usage events と usage 分析を追加 Jun 6, 2026

j5ik2o commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

動作確認メモです。

PR #785 の phase usage events について、ローカルで codex provider を使って確認しました。

確認内容:

  • npm run build が成功
  • observability.enabled: true / observability.usage_events_phase: true / observability.monitor: true / observability.session_log_exporter: true を有効化
  • --provider codexreport-judge 系の最小 workflow を実行
  • .takt/runs/<run>/logs/<session>-usage-events.phase.jsonl が生成されることを確認
  • shadow session log と monitor.json も生成されることを確認
  • npm run analyze:usage -- <run-dir>--format csv の集計が成功することを確認

Codex provider では mock provider と違い、usage が取得できていました。

確認できた phase usage records:

phase1_execute       usage_missing=false total_tokens=155117
phase2_report        usage_missing=false total_tokens=186968
phase3_structured    usage_missing=false total_tokens=29869

今回の実行では structured output 判定で完了したため、phase3_tag は発生していません。これは期待どおりの挙動だと判断しています。

ローカル確認の範囲では、phase usage events の出力と analyze:usage 集計は期待どおり動作しています。

@j5ik2o j5ik2o marked this pull request as ready for review June 8, 2026 05:48

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/core/workflow/engine/ArpeggioRunner.ts (1)

200-205: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

例外時の usage 情報の一貫性を改善する。

runWithPhaseSpan の catch ブロック(Line 187-194)では providerUsageusageMissing: true を設定していますが、この catch ブロックでは設定されていません。一貫性のため、ここでも providerUsage を設定することを推奨します。

♻️ 提案する修正
     return {
       batchIndex: batch.batchIndex,
       content: '',
       success: false,
       error: lastError,
+      providerUsage: {
+        usageMissing: true,
+        reason: USAGE_MISSING_REASONS.NOT_AVAILABLE,
+      },
     };

ファイル冒頭に import を追加:

+import { USAGE_MISSING_REASONS } from '../../logging/contracts.js';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/workflow/engine/ArpeggioRunner.ts` around lines 200 - 205, The catch
block in runWithPhaseSpan in ArpeggioRunner.ts returns a result without setting
providerUsage (unlike the earlier catch that sets usageMissing: true), causing
inconsistent usage metadata; update the returned object in this catch to include
providerUsage with usageMissing: true (and any minimal structure expected by the
caller) alongside batchIndex, content, success, and error so both failure paths
produce consistent providerUsage information.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/specs/observability.e2e.ts`:
- Around line 125-127: The test reads monitor.json into the variable monitor and
currently checks for '"takt.run.id"' via JSON.stringify, which is brittle;
instead assert the parsed object contains that key directly. Update the
assertions around monitor (from observability.e2e.ts) to treat monitor as an
object and use a direct property existence check (e.g.,
Object.prototype.hasOwnProperty.call(monitor, 'takt.run.id') or a testing helper
like expect(monitor).toHaveProperty('takt.run.id')) and optionally validate the
property's value/type rather than relying on JSON.stringify.

In `@src/__tests__/otelFoundation.test.ts`:
- Line 218: The test currently asserts spanProcessors length equals 2, but
createSpanProcessorState() always returns [sessionLogSpanProcessor,
usageEventsSpanProcessor], so instead change the test to assert the behavior of
registration: when usageEventsPhase is false, verify that the
usageEventsSpanProcessor (or its exporter/register method) is not called; when
true, verify it is called — locate assertions around
foundation.constructedOptions and replace the length check with spies/mocks on
usageEventsSpanProcessor.register (or the exporter.register function used by
createSpanProcessorState) and assert call/non-call accordingly.

In `@src/infra/observability/usageEventsSpanProcessor.ts`:
- Around line 18-85: The instance-level hasReportedWriteFailure hides write
errors for different runId files; change hasReportedWriteFailure into a per-run
map (e.g. hasReportedWriteFailureByRun: Map<string, boolean>) and update
safeAppend to use options.runId (or options.phaseUsageLogPath) as the key when
checking/setting the flag; also update register's returned unregister function
to remove the flag for that runId and clear the map in shutdown. Ensure you
modify the UsageEventsSpanProcessor constructor/fields, safeAppend, register
(returned deleter), and shutdown to operate on the per-run map instead of a
single boolean.

---

Outside diff comments:
In `@src/core/workflow/engine/ArpeggioRunner.ts`:
- Around line 200-205: The catch block in runWithPhaseSpan in ArpeggioRunner.ts
returns a result without setting providerUsage (unlike the earlier catch that
sets usageMissing: true), causing inconsistent usage metadata; update the
returned object in this catch to include providerUsage with usageMissing: true
(and any minimal structure expected by the caller) alongside batchIndex,
content, success, and error so both failure paths produce consistent
providerUsage information.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 87d7c8ed-a551-42ab-b14d-f7526f47732f

📥 Commits

Reviewing files that changed from the base of the PR and between c226cca and b6a5532.

📒 Files selected for processing (37)
  • README.md
  • docs/README.ja.md
  • docs/configuration.ja.md
  • docs/configuration.md
  • docs/observability.ja.md
  • docs/observability.md
  • docs/testing/e2e.md
  • e2e/specs/observability.e2e.ts
  • package.json
  • src/__tests__/analyze-usage-command.test.ts
  • src/__tests__/logging-contracts.test.ts
  • src/__tests__/otelFoundation.test.ts
  • src/__tests__/phaseUsageEvent.test.ts
  • src/__tests__/usageEventsSpanProcessor.test.ts
  • src/__tests__/workflowSpans.test.ts
  • src/agents/judge-status-usecase.ts
  • src/agents/structured-caller/prompt-based-structured-caller.ts
  • src/commands/analyze-usage.ts
  • src/core/logging/contracts.ts
  • src/core/logging/phaseUsageEvent.ts
  • src/core/logging/providerEvent.ts
  • src/core/workflow/arpeggio/types.ts
  • src/core/workflow/engine/ArpeggioRunner.ts
  • src/core/workflow/engine/ParallelRunner.ts
  • src/core/workflow/engine/StepExecutor.ts
  • src/core/workflow/engine/team-leader-part-runner.ts
  • src/core/workflow/observability/workflowSpans.ts
  • src/core/workflow/report-phase-runner.ts
  • src/core/workflow/status-judgment-phase.ts
  • src/core/workflow/types.ts
  • src/features/tasks/execute/workflowExecutionBootstrap.ts
  • src/infra/fs/index.ts
  • src/infra/fs/jsonl.ts
  • src/infra/observability/otelFoundation.ts
  • src/infra/observability/usageEventsSpanProcessor.ts
  • src/shared/types/provider.ts
  • vitest.config.e2e.mock.ts

Comment thread e2e/specs/observability.e2e.ts Outdated
Comment thread src/__tests__/otelFoundation.test.ts Outdated
Comment thread src/infra/observability/usageEventsSpanProcessor.ts

j5ik2o commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

動作確認メモです。

PR #785 の phase usage events について、ローカルで claude provider(Claude Code CLI の headless claude -p --output-format stream-json 経由)と claude-sdk provider の両方を確認しました。

確認内容:

  • npm run build が成功
  • 一時 repo / 一時 TAKT_CONFIG_DIR で実行
  • observability.enabled: true / observability.usage_events_phase: true / observability.monitor: true / observability.session_log_exporter: true を有効化
  • report-judge 系の最小 workflow を --provider claude--provider claude-sdk でそれぞれ実行
  • .takt/runs/<run>/logs/<session>-usage-events.phase.jsonl が生成されることを確認
  • shadow session log と monitor.json も生成されることを確認
  • npm run analyze:usage -- <run-dir>--format csv の集計が成功することを確認

claude provider で確認できた phase usage records:

phase1_execute       usage_missing=false total_tokens=3009
phase2_report        usage_missing=false total_tokens=718
phase3_structured    usage_missing=false total_tokens=2688

claude-sdk provider で確認できた phase usage records:

phase1_execute       usage_missing=false total_tokens=752
phase2_report        usage_missing=false total_tokens=214
phase3_structured    usage_missing=false total_tokens=301

どちらの provider でも usage_missing=false で usage が取得でき、phase usage events / shadow session log / monitor.json / analyze:usage 集計まで期待どおり動作しました。

今回の実行では structured output 判定で完了したため、phase3_tag は発生していません。これは期待どおりの挙動だと判断しています。

@j5ik2o j5ik2o marked this pull request as draft June 8, 2026 06:39

j5ik2o commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

補足です。

上の動作確認では claude provider(Claude Code CLI の headless claude -p --output-format stream-json 経路)と claude-sdk provider で token 数に差が出ていますが、これは TAKT の phase usage events 計装が provider ごとに別条件で記録している、という意味ではありません。

TAKT 側の計装経路は共通です。

provider raw usage
  -> extractClaudeProviderUsage()
  -> ProviderUsageSnapshot
  -> span の gen_ai.usage.* 属性
  -> *-usage-events.phase.jsonl

claude provider と claude-sdk provider は、どちらも同じ extractClaudeProviderUsage()input_tokens / output_tokens / cache 系 token を正規化しています。total_tokens も同じく input_tokens + output_tokens として扱い、cache 系 token は別フィールドとして保持しています。

一方で、両 provider は provider 側の実行経路が異なります。

  • claude: Claude Code CLI の headless claude -p --output-format stream-json 経路
  • claude-sdk: @anthropic-ai/claude-agent-sdk 経路

そのため、Claude 側で返される raw usage の粒度・内部プロンプト・ツール呼び出しの包み方・structured output の扱い・session/context の載せ方が完全に同一とは限りません。また、今回の確認は別々の実行なので、生成内容や judge の中間出力も完全一致ではありません。

したがって、今回の token 数差は「TAKT の計装が provider ごとに不公平に記録している」ことを示すものではなく、claude -p 経路と claude-sdk 経路が返す provider raw usage / 実行内容の差に由来し得るものです。

この動作確認の主目的は、両 provider で phase usage events が欠損せず usage_missing=false として流れ、analyze:usage で集計できることの確認です。provider 間のコスト比較をする場合は、同一 model を明示し、同一 task/workflow を複数回実行したうえで、input_tokens / output_tokens / cache_creation_input_tokens / cache_read_input_tokens を分けて見る必要があります。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[otel] 集計スクリプト + docs [otel] opt-in phase 付き UsageEventsExporter(既存 usage_events は置き換えない)

1 participant