Fix token double-count from dual Claude Code telemetry channels#155
Merged
Conversation
Claude Code reports each request's usage on both a claude_code.api_request log record and the claude_code.token.usage metric. Beacon ingested both, so every token field (input/output/cache read/cache creation) was counted twice in totals and every rollup; cost was unaffected because it rides only on claude_code.cost.usage. Treat the log/span channel as the token source of truth. Per (harness, session) scope, zero any usage field that channel reports on the scope's metric-channel events, and drop metric events left with no usage. Fields the log channel never carries (cost) survive on the metric channel, so cost still lands exactly once. Scopes with no log/span channel (metrics-only runtimes) are untouched, so their usage is not dropped. Verified against real captured Claude Code telemetry: report totals now equal log-channel tokens plus metric-channel cost exactly, and the dashboard /api/tokens matches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
End-to-end testing of the token usage feature on a live Claude Code session surfaced that every token field is counted twice. Claude Code reports each request's usage on both of its OTel channels, and Beacon's install enables both:
claude_code.api_requestrecord carrying fullgen_ai.usage(nometric_name).claude_code.token.usage/claude_code.cost.usagedatapoints.tokens.Aggregatesummed every usage-bearing event with no channel de-dup, so totals,by_model,by_session,by_harness, and the dashboard were all ~2× inflated. Captured proof — the two channels were byte-identical:Cost was already correct because it rides only on
claude_code.cost.usage. The[1m]two-row symptom was downstream of this (the channels label the model differently: base name on logs,[1m]request model on metrics).Fix
Treat the log/span channel as the token source of truth. Per
(harness, session)scope, zero any usage field that channel reports on the scope's metric-channel events, and drop metric events left with no usage. Fields the log channel never carries (cost) survive on the metric channel, so cost still lands exactly once. Scopes with no log/span channel (metrics-only runtimes like Codex/SDK) are left untouched, so their usage is never dropped.CLI-only change — the collector binary is unaffected.
Verification
go test ./...,go test -race ./internal/endpoint/...,gofmt,go vetall pass.TestAggregateDedupesDualChannelUsagecovers dual-channel collapse, cost-only survival on[1m], and a metrics-only scope passthrough.log-channel tokens + metric-channel costexactly (e.g. haiku7940/5032/559750→3970/2516/279875),[1m]rows are cost-only, the bogus103.5%utilization artifact is gone, and the dashboard/api/tokensmatches the CLI.🤖 Generated with Claude Code
Note
Medium Risk
Changes core token aggregation logic used by CLI and dashboard; wrong dedupe rules could under-count metrics-only runtimes or drop valid usage, though scoped to (harness, session) and covered by new tests.
Overview
Fixes ~2× inflated token totals when Claude Code telemetry is ingested from both OTel logs (
claude_code.api_request, nometric_name) and metrics (claude_code.token.usage).tokens.Aggregatenow runsdedupeOverlappingChannelsafter collecting usage events and before cumulative delta resolution. Per (harness, session), log/span events define which usage fields are authoritative; matching fields on metric events are zeroed, empty metric rows are dropped, and fields only on metrics (e.g. cost onclaude_code.cost.usage) are kept. Scopes with metrics only are unchanged.Adds
TestAggregateDedupesDualChannelUsagefor dual-channel collapse, cost-only[1m]attribution, and metrics-only passthrough. CLI token commands and dashboard/api/tokenspick this up via the shared aggregator.Reviewed by Cursor Bugbot for commit a8fb55d. Bugbot is set up for automated code reviews on this repo. Configure here.