[otel] OpenTelemetry phase usage events と usage 分析を追加 by j5ik2o · Pull Request #785 · nrslib/takt

j5ik2o · 2026-06-06T01:42:40Z

概要

[otel] opt-in phase 付き UsageEventsExporter（既存 usage_events は置き換えない） #704: opt-in の phase-level UsageEventsExporter を追加し、既存 logging.usage_events は維持
[otel] 集計スクリプト + docs #705: phase usage events の Markdown/CSV 集計コマンドと docs を追加
review 指摘に基づき exporter 登録の rollback と型境界を整理

テスト

npm run build
npm run lint
npm test
npm test -- --run src/tests/analyze-usage-command.test.ts src/tests/usageEventsSpanProcessor.test.ts src/tests/phaseUsageEvent.test.ts src/tests/otelFoundation.test.ts src/tests/workflowSpans.test.ts
git diff --check

Closes #704
Closes #705

Summary by CodeRabbit

新機能
- オブザーバビリティをオプトインで追加。フェーズ単位の利用イベントをログ出力し、プロバイダ利用情報（トークン/キャッシュ含む）を収集・分析可能になりました。
- CLIコマンドでフェーズ利用イベントを集計し、Markdown/CVS出力できます。
ドキュメント
- 日英で観測機能の有効化手順、出力先、集計/分析手順を追記・更新。
テスト
- E2Eおよび多数のユニットテストを追加・拡張し、出力形式と集計振る舞いを検証しました。

coderabbitai · 2026-06-06T01:42:47Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: fad988f8-527b-4f01-9d45-96eec545fd28

📥 Commits

Reviewing files that changed from the base of the PR and between 811b3a7 and 4cf87e7.

📒 Files selected for processing (5)

e2e/specs/observability.e2e.ts
src/__tests__/otelFoundation.test.ts
src/__tests__/usageEventsSpanProcessor.test.ts
src/core/workflow/engine/ArpeggioRunner.ts
src/infra/observability/usageEventsSpanProcessor.ts

📝 Walkthrough

Walkthrough

このPRはオプトインのフェーズ粒度usageイベント出力を追加します。OpenTelemetryスパンからフェーズ使用量レコードを生成してJSONLへ追記するSpanProcessor、収集ログを集計するCLI、実行時の配線、ドキュメント、およびE2E/ユニットテストを含みます。

Changes

Phase usage events observability feature

Layer / File(s)	Summary
Documentation and configuration updates `README.md`, `docs/README.ja.md`, `docs/configuration.`, `docs/observability.`, `docs/testing/e2e.md`, `e2e/specs/observability.e2e.ts`	Observabilityガイド追加、設定フラグ（`observability.enabled`/`usage_events_phase`）の説明、出力先（`.takt/runs/.../logs/<session>-usage-events.phase.jsonl`）、既存`logging.usage_events`との共存、解析手順を記載。
Phase usage event types and mapping `src/core/logging/contracts.ts`, `src/core/logging/phaseUsageEvent.ts`	`PHASE_USAGE_EVENTS_LOG_FILE_SUFFIX`を定義。`mapSpanEndToPhaseUsageEvent`はspan属性からphase/judge情報とusageを抽出し、missing正規化やcache関連トークンを含むレコードを組み立てる。
OpenTelemetry span processor and NDJSON append helper `src/infra/fs/jsonl.ts`, `src/infra/fs/index.ts`, `src/infra/observability/usageEventsSpanProcessor.ts`	`appendJsonLine`でNDJSON追記。`UsageEventsSpanProcessor`が`onEnd`でphase usageレコードを生成してファイルへ追記。追記失敗は run ごとに一度だけログ出力し抑制する。
OtelFoundation integration and run-scoped exporter safety `src/infra/observability/otelFoundation.ts`	runId 整合性検証、exporter 登録の例外時クリーンアップ、Shared SDK の span processor 状態拡張、`usageEventsExporter` オプション追加を実装。
Usage payload refactoring and cache token fields `src/core/logging/providerEvent.ts`	`buildUsageEventPayload`を分離し、`UsageEventLogRecord.usage`に`cache_creation_input_tokens`/`cache_read_input_tokens`を追加。
ProviderUsage threading through workflow execution `src/core/workflow/arpeggio/types.ts`, `src/core/workflow/engine/*`, `src/core/workflow/observability/workflowSpans.ts`, `src/core/workflow/types.ts`	`BatchResult`や`AgentResponse`へ`providerUsage`を伝搬。フェーズ/ジャッジのspan属性とエラー時outcomeにusage情報を反映する。`usageAttributes()`で`gen_ai.usage.`または`takt.usage.`属性を生成。
Judge stage logging refactor with providerUsage `src/agents/judge-status-usecase.ts`, `src/agents/structured-caller/prompt-based-structured-caller.ts`, `src/core/workflow/status-judgment-phase.ts`	`JudgeStageLogEntry`導入、`createJudgeStageRecorder()`でステージ3の収集を状態的に管理し、`onJudgeStage`/`onJudgeResponse`へ`providerUsage`を渡すよう変更。
Claude provider usage extraction and integration `src/infra/claude/usage.ts`, `src/infra/claude/executor.ts`, `src/infra/claude-headless/*`	Claude の raw usage から `ProviderUsageSnapshot` を抽出する `extractClaudeProviderUsage` を実装し、executor と stream 集約の戻り値に `providerUsage` を追加。
Usage analysis CLI command and scripts `src/commands/analyze-usage.ts`, `package.json`	`analyze:usage` スクリプトと CLI を追加。フェーズusage JSONL を解決・読み取りし `step/phase/provider/model` で集計、runs/calls/missing やトークン統計（合計/平均/中央値/標準偏差）を算出し Markdown または CSV で出力。パースエラーは行番号を含む。
Provider type safety refactoring `src/shared/types/provider.ts`	`PROVIDER_TYPES`配列由来の型に変更し、`isProviderType` は Set チェックで型ガードを実装。
E2E test suite for observability outputs `e2e/specs/observability.e2e.ts`, `vitest.config.e2e.mock.ts`	isolated 環境で observability 有効設定の下、`takt` 実行後に `-usage-events.phase.jsonl`/`-otel-session-shadow.jsonl`/`monitor.json` の存在とフィールドを検証する E2E を追加。
Unit test coverage for phase events and analysis `src/__tests__/*`	phase マッピング、usage missing 正規化、span processor の登録と書き込み挙動、OtelFoundation の exporter 検証、analyze-usage 集計/整形/エラー処理、キャッシュトークンのシリアライズ等を検証するテストを追加・更新。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

nrslib/takt#753: Both PRs modify the same observability initialization/wiring path in initializeOtelFoundation to register new OpenTelemetry SpanProcessors (main: UsageEventsSpanProcessor; retrieved: SessionLogSpanProcessor), so the changes are code-level related.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	PR タイトルは「[otel] OpenTelemetry phase usage events と usage 分析を追加」で、変更セットの主要な内容（phase-level usage events exporter、usage 分析コマンド、ドキュメント追加）を明確に要約しており、簡潔で具体的です。
Linked Issues check	✅ Passed	PR は `#704`（opt-in phase 付き UsageEventsExporter）と `#705`（集計スクリプト + docs）に対応し、両者の要件を満たしています。UsageEventsSpanProcessor、phaseUsageEvent マッピング、analyze-usage コマンド、ドキュメント（observability.md/observability.ja.md）が実装され、既存の logging.usage_events は置き換えず維持されています。
Out of Scope Changes check	✅ Passed	ほぼすべての変更が `#704` および `#705` の要件範囲内です。ProviderType の定義変更は typesafe な guard 関数導入で周辺整理、BatchResult への providerUsage フィールド追加はフェーズ実行追跡に必須、JudgeStageLogEntry の導入も judge stage logging の拡張で妥当性が高いです。

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/otel-usage-events-exporter-docs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

e2e/specs/observability.e2e.ts

Parsing error: /e2e/specs/observability.e2e.ts was not found by the project service. Consider either including it in the tsconfig.json or including it in allowDefaultProject.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-06-06T02:03:12Z

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Generating unit tests... This may take up to 20 minutes.

coderabbitai · 2026-06-06T02:09:21Z

✅ Created PR with unit tests: #786

j5ik2o · 2026-06-08T05:47:24Z

動作確認メモです。

PR #785 の phase usage events について、ローカルで codex provider を使って確認しました。

確認内容:

npm run build が成功
observability.enabled: true / observability.usage_events_phase: true / observability.monitor: true / observability.session_log_exporter: true を有効化
--provider codex で report-judge 系の最小 workflow を実行
.takt/runs/<run>/logs/<session>-usage-events.phase.jsonl が生成されることを確認
shadow session log と monitor.json も生成されることを確認
npm run analyze:usage -- <run-dir> と --format csv の集計が成功することを確認

Codex provider では mock provider と違い、usage が取得できていました。

確認できた phase usage records:

phase1_execute       usage_missing=false total_tokens=155117
phase2_report        usage_missing=false total_tokens=186968
phase3_structured    usage_missing=false total_tokens=29869

今回の実行では structured output 判定で完了したため、phase3_tag は発生していません。これは期待どおりの挙動だと判断しています。

ローカル確認の範囲では、phase usage events の出力と analyze:usage 集計は期待どおり動作しています。

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/core/workflow/engine/ArpeggioRunner.ts (1)
200-205: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

例外時の usage 情報の一貫性を改善する。

runWithPhaseSpan の catch ブロック（Line 187-194）では providerUsage に usageMissing: true を設定していますが、この catch ブロックでは設定されていません。一貫性のため、ここでも providerUsage を設定することを推奨します。
♻️ 提案する修正
     return {
       batchIndex: batch.batchIndex,
       content: '',
       success: false,
       error: lastError,
+      providerUsage: {
+        usageMissing: true,
+        reason: USAGE_MISSING_REASONS.NOT_AVAILABLE,
+      },
     };
ファイル冒頭に import を追加：
+import { USAGE_MISSING_REASONS } from '../../logging/contracts.js';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/workflow/engine/ArpeggioRunner.ts` around lines 200 - 205, The catch
block in runWithPhaseSpan in ArpeggioRunner.ts returns a result without setting
providerUsage (unlike the earlier catch that sets usageMissing: true), causing
inconsistent usage metadata; update the returned object in this catch to include
providerUsage with usageMissing: true (and any minimal structure expected by the
caller) alongside batchIndex, content, success, and error so both failure paths
produce consistent providerUsage information.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/specs/observability.e2e.ts`:
- Around line 125-127: The test reads monitor.json into the variable monitor and
currently checks for '"takt.run.id"' via JSON.stringify, which is brittle;
instead assert the parsed object contains that key directly. Update the
assertions around monitor (from observability.e2e.ts) to treat monitor as an
object and use a direct property existence check (e.g.,
Object.prototype.hasOwnProperty.call(monitor, 'takt.run.id') or a testing helper
like expect(monitor).toHaveProperty('takt.run.id')) and optionally validate the
property's value/type rather than relying on JSON.stringify.

In `@src/__tests__/otelFoundation.test.ts`:
- Line 218: The test currently asserts spanProcessors length equals 2, but
createSpanProcessorState() always returns [sessionLogSpanProcessor,
usageEventsSpanProcessor], so instead change the test to assert the behavior of
registration: when usageEventsPhase is false, verify that the
usageEventsSpanProcessor (or its exporter/register method) is not called; when
true, verify it is called — locate assertions around
foundation.constructedOptions and replace the length check with spies/mocks on
usageEventsSpanProcessor.register (or the exporter.register function used by
createSpanProcessorState) and assert call/non-call accordingly.

In `@src/infra/observability/usageEventsSpanProcessor.ts`:
- Around line 18-85: The instance-level hasReportedWriteFailure hides write
errors for different runId files; change hasReportedWriteFailure into a per-run
map (e.g. hasReportedWriteFailureByRun: Map<string, boolean>) and update
safeAppend to use options.runId (or options.phaseUsageLogPath) as the key when
checking/setting the flag; also update register's returned unregister function
to remove the flag for that runId and clear the map in shutdown. Ensure you
modify the UsageEventsSpanProcessor constructor/fields, safeAppend, register
(returned deleter), and shutdown to operate on the per-run map instead of a
single boolean.

---

Outside diff comments:
In `@src/core/workflow/engine/ArpeggioRunner.ts`:
- Around line 200-205: The catch block in runWithPhaseSpan in ArpeggioRunner.ts
returns a result without setting providerUsage (unlike the earlier catch that
sets usageMissing: true), causing inconsistent usage metadata; update the
returned object in this catch to include providerUsage with usageMissing: true
(and any minimal structure expected by the caller) alongside batchIndex,
content, success, and error so both failure paths produce consistent
providerUsage information.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 87d7c8ed-a551-42ab-b14d-f7526f47732f

📥 Commits

Reviewing files that changed from the base of the PR and between c226cca and b6a5532.

📒 Files selected for processing (37)

README.md
docs/README.ja.md
docs/configuration.ja.md
docs/configuration.md
docs/observability.ja.md
docs/observability.md
docs/testing/e2e.md
e2e/specs/observability.e2e.ts
package.json
src/__tests__/analyze-usage-command.test.ts
src/__tests__/logging-contracts.test.ts
src/__tests__/otelFoundation.test.ts
src/__tests__/phaseUsageEvent.test.ts
src/__tests__/usageEventsSpanProcessor.test.ts
src/__tests__/workflowSpans.test.ts
src/agents/judge-status-usecase.ts
src/agents/structured-caller/prompt-based-structured-caller.ts
src/commands/analyze-usage.ts
src/core/logging/contracts.ts
src/core/logging/phaseUsageEvent.ts
src/core/logging/providerEvent.ts
src/core/workflow/arpeggio/types.ts
src/core/workflow/engine/ArpeggioRunner.ts
src/core/workflow/engine/ParallelRunner.ts
src/core/workflow/engine/StepExecutor.ts
src/core/workflow/engine/team-leader-part-runner.ts
src/core/workflow/observability/workflowSpans.ts
src/core/workflow/report-phase-runner.ts
src/core/workflow/status-judgment-phase.ts
src/core/workflow/types.ts
src/features/tasks/execute/workflowExecutionBootstrap.ts
src/infra/fs/index.ts
src/infra/fs/jsonl.ts
src/infra/observability/otelFoundation.ts
src/infra/observability/usageEventsSpanProcessor.ts
src/shared/types/provider.ts
vitest.config.e2e.mock.ts

…vents fix(observability): capture Claude usage for phase events

j5ik2o · 2026-06-08T06:37:30Z

動作確認メモです。

PR #785 の phase usage events について、ローカルで claude provider（Claude Code CLI の headless claude -p --output-format stream-json 経由）と claude-sdk provider の両方を確認しました。

確認内容:

npm run build が成功
一時 repo / 一時 TAKT_CONFIG_DIR で実行
observability.enabled: true / observability.usage_events_phase: true / observability.monitor: true / observability.session_log_exporter: true を有効化
report-judge 系の最小 workflow を --provider claude と --provider claude-sdk でそれぞれ実行
.takt/runs/<run>/logs/<session>-usage-events.phase.jsonl が生成されることを確認
shadow session log と monitor.json も生成されることを確認
npm run analyze:usage -- <run-dir> と --format csv の集計が成功することを確認

claude provider で確認できた phase usage records:

phase1_execute       usage_missing=false total_tokens=3009
phase2_report        usage_missing=false total_tokens=718
phase3_structured    usage_missing=false total_tokens=2688

claude-sdk provider で確認できた phase usage records:

phase1_execute       usage_missing=false total_tokens=752
phase2_report        usage_missing=false total_tokens=214
phase3_structured    usage_missing=false total_tokens=301

どちらの provider でも usage_missing=false で usage が取得でき、phase usage events / shadow session log / monitor.json / analyze:usage 集計まで期待どおり動作しました。

今回の実行では structured output 判定で完了したため、phase3_tag は発生していません。これは期待どおりの挙動だと判断しています。

j5ik2o · 2026-06-08T06:41:36Z

補足です。

上の動作確認では claude provider（Claude Code CLI の headless claude -p --output-format stream-json 経路）と claude-sdk provider で token 数に差が出ていますが、これは TAKT の phase usage events 計装が provider ごとに別条件で記録している、という意味ではありません。

TAKT 側の計装経路は共通です。

provider raw usage
  -> extractClaudeProviderUsage()
  -> ProviderUsageSnapshot
  -> span の gen_ai.usage.* 属性
  -> *-usage-events.phase.jsonl

claude provider と claude-sdk provider は、どちらも同じ extractClaudeProviderUsage() で input_tokens / output_tokens / cache 系 token を正規化しています。total_tokens も同じく input_tokens + output_tokens として扱い、cache 系 token は別フィールドとして保持しています。

一方で、両 provider は provider 側の実行経路が異なります。

claude: Claude Code CLI の headless claude -p --output-format stream-json 経路
claude-sdk: @anthropic-ai/claude-agent-sdk 経路

そのため、Claude 側で返される raw usage の粒度・内部プロンプト・ツール呼び出しの包み方・structured output の扱い・session/context の載せ方が完全に同一とは限りません。また、今回の確認は別々の実行なので、生成内容や judge の中間出力も完全一致ではありません。

したがって、今回の token 数差は「TAKT の計装が provider ごとに不公平に記録している」ことを示すものではなく、claude -p 経路と claude-sdk 経路が返す provider raw usage / 実行内容の差に由来し得るものです。

この動作確認の主目的は、両 provider で phase usage events が欠損せず usage_missing=false として流れ、analyze:usage で集計できることの確認です。provider 間のコスト比較をする場合は、同一 model を明示し、同一 task/workflow を複数回実行したうえで、input_tokens / output_tokens / cache_creation_input_tokens / cache_read_input_tokens を分けて見る必要があります。

j5ik2o added 4 commits June 5, 2026 23:54

feat(observability): add phase usage events exporter

9bfc50a

feat(observability): add usage analysis command

ed242f4

fix(observability): tighten phase usage boundaries

79d59b7

fix(observability): roll back failed exporter registration

628537d

coderabbitai Bot mentioned this pull request Jun 6, 2026

CodeRabbit Generated Unit Tests: Add unit tests for PR changes #786

Open

j5ik2o changed the title ~~OpenTelemetry phase usage events と usage 分析を追加~~ [otel] OpenTelemetry phase usage events と usage 分析を追加 Jun 6, 2026

j5ik2o added 2 commits June 6, 2026 14:14

test(observability): cover config-driven local exporters

1cafa58

fix(observability): harden phase usage exporter plumbing

b6a5532

j5ik2o marked this pull request as ready for review June 8, 2026 05:48

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread e2e/specs/observability.e2e.ts Outdated

Comment thread src/__tests__/otelFoundation.test.ts Outdated

Comment thread src/infra/observability/usageEventsSpanProcessor.ts

j5ik2o added 2 commits June 8, 2026 15:06

fix(observability): capture claude headless usage

3a2b825

refactor(observability): share claude usage normalization

28bfdeb

j5ik2o mentioned this pull request Jun 8, 2026

fix(observability): capture Claude usage for phase events #790

Merged

j5ik2o added 2 commits June 8, 2026 15:11

Merge pull request #790 from nrslib/codex/fix-claude-headless-usage-e…

811b3a7

…vents fix(observability): capture Claude usage for phase events

fix(observability): address phase usage review comments

4cf87e7

j5ik2o marked this pull request as draft June 8, 2026 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel] OpenTelemetry phase usage events と usage 分析を追加#785

[otel] OpenTelemetry phase usage events と usage 分析を追加#785
j5ik2o wants to merge 10 commits into
mainfrom
codex/otel-usage-events-exporter-docs

j5ik2o commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026

Uh oh!

j5ik2o commented Jun 8, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

j5ik2o commented Jun 8, 2026

Uh oh!

j5ik2o commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

j5ik2o commented Jun 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概要

テスト

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026

Uh oh!

j5ik2o commented Jun 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

j5ik2o commented Jun 8, 2026

Uh oh!

j5ik2o commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

j5ik2o commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading