fix(client): preserve cache token counts when message_delta omits them#215
fix(client): preserve cache token counts when message_delta omits them#215sumleo wants to merge 2 commits into
Conversation
|
Hi @anthropics, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time! |
| // The cache token fields are nullable and are usually omitted on the | ||
| // terminal message_delta, so merge them with the values already | ||
| // populated from message_start instead of replacing wholesale. | ||
| usageDetails = ToUsageDetails(deltaUsage, usageDetails); |
There was a problem hiding this comment.
Rather than passing usageDetails, you can use local variables to track CacheCreationInputTokens and CacheReadInputTokens (and also InputTokens and ServerToolUse since those are nullable too).
There was a problem hiding this comment.
Done in b346a3d. Both the stable and beta streaming paths now capture InputTokens, CacheCreationInputTokens, CacheReadInputTokens, and ServerToolUse from message_start into locals and fall back to them when the terminal message_delta omits them, instead of round-tripping through the previously built UsageDetails. That also removes the helper that re-read CacheCreationInputTokens back out of AdditionalCounts, and covers the InputTokens/ServerToolUse omission you noted (both nullable). csharpier and the build are clean; I couldn't launch the test runner locally (the testhost gets killed in my environment), so I'm relying on CI to run the suite.
… merge Track the nullable cumulative usage fields (InputTokens, CacheCreationInputTokens, CacheReadInputTokens, ServerToolUse) reported on message_start in local variables and fall back to them when the terminal message_delta omits them, instead of round-tripping through the previously built UsageDetails. This also covers InputTokens and ServerToolUse (both nullable) being omitted on the delta, and removes the helper that re-parsed CacheCreationInputTokens back out of AdditionalCounts. Same change applied to the beta path.
Summary
The Microsoft.Extensions.AI streaming bridge loses the prompt-caching token counts reported by
message_startwhenever the terminalmessage_deltaomits the cache fields (which is the common case).In both
AsIChatClientbridges themessage_deltahandler replaced the accumulated usage wholesale:MessageDeltaUsage.CacheReadInputTokensandCacheCreationInputTokensare nullable and are usually absent onmessage_delta.ToUsageDetailsmaps those nulls straight through, so the cache counts populated frommessage_startare wiped out:CachedInputTokenCountbecomesnullandInputTokenCount/TotalTokenCountare understated.Root cause
src/Anthropic/AnthropicClientExtensions.cs(RawMessageDeltaEventhandler, ~line 638)src/Anthropic/Services/Beta/Messages/AnthropicBetaClientExtensions.cs(BetaRawMessageDeltaEventhandler, ~line 361)Fix
Replace the wholesale assignment with a per-field, null-safe merge: only the delta's non-null cache values override what
message_startalready populated; when the delta omits a cache field, the existing value is kept and the derived input/total counts are recomputed accordingly.This mirrors how the SDK already merges streaming usage elsewhere:
src/Anthropic/Helpers/MessageContentAggregator.cs(null-checked overwrite ofCacheCreationInputTokens/CacheReadInputTokens, lines ~70-80)FallbackStreamSplicer.Backfill(src/Anthropic/Helpers/FallbackStreamSplicer.cs, ~line 889)Test
Added
GetStreamingResponseAsync_DeltaWithoutCacheTokens_PreservesStartCacheTokenstoAnthropicClientExtensionsTestsBase(so it runs for both the standard and Beta bridges). It streams amessage_startcarrying cache tokens followed by amessage_deltathat omits the cache fields, and asserts the final usage still reportsCachedInputTokenCount,CacheCreationInputTokens, and the correctInputTokenCount/TotalTokenCount.These are hand-written bridge files (not generated), so the change is local to them.