feat: expose cumulative token usage per ExecuteTools run#55
Merged
Conversation
Add an exported Background flag on AgentState, set true for background spawns (spawn_agent background=true) and false for foreground ones. This lets embedders tell unattended background work apart from a foreground sub-agent whose result is consumed inline — e.g. to auto-notify on completion only for background agents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a counting LLM decorator and a CumulativeUsage field on Status so a full ExecuteTools run's token usage can be summed and exposed. Preserves StreamingLLM so wrapping does not disable streaming.
Buffer the forwarded stream channel and make the send context-aware to match the client convention and prevent goroutine leaks. Document that streaming-path usage is unpopulated by the bundled clients today.
Wraps the run LLM in a counting decorator and stamps the summed usage onto the returned fragment's Status.CumulativeUsage via a deferred named return, covering all exit paths. Each sub-agent run reports its own total.
5 tasks
mudler
added a commit
to mudler/nib
that referenced
this pull request
Jun 4, 2026
…al replace cogito's CumulativeUsage (mudler/cogito#55) is now released on main, so nib depends on the published pseudo-version and the dev-time local replace is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mudler
added a commit
to mudler/nib
that referenced
this pull request
Jun 4, 2026
…16) * docs: design spec for sub-agent completion stats line Spec for rendering a Claude-Code-style completion summary (tools · cumulative tokens · elapsed) when a spawned sub-agent finishes, in both the TUI and the plain CLI. Cumulative token tracking requires a small cogito-side accumulator exposed on the returned fragment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: implementation plan for sub-agent completion stats line Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * build: replace cogito with local worktree during dev Temporary: lets nib build against the un-released CumulativeUsage change. Removed once cogito is tagged (see plan Task 9). * feat(chat): AgentEvent run-stats fields and formatters * feat(chat): time sub-agents and populate completion run-stats * feat(tui): show sub-agent run-stats on the completion marker * feat(cli): show sub-agent run-stats on the completion line * build: bump cogito to v0.10.1-0.20260604082319-fe7fd5de11d1; drop local replace cogito's CumulativeUsage (mudler/cogito#55) is now released on main, so nib depends on the published pseudo-version and the dev-time local replace is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a way to read the cumulative token usage of a whole
ExecuteToolsrun (and therefore of each spawned sub-agent), where previously only the last LLM call's usage was retained onStatus.LastUsage.Status.CumulativeUsage LLMUsagefield, summed across every LLM call in a run.LLMdecorator (newCountingLLM) that accumulates usage from bothCreateChatCompletionandAsk. It preservesStreamingLLMso wrapping never disables the streaming code path.ExecuteToolswraps itsllmonce (after the sub-agent fallbackagentLLMis captured, so a sub-agent's tokens are not folded into the parent) and stamps the total onto the returned fragment via a deferred named return — covering all exit paths.Because each sub-agent runs its own
ExecuteToolsandrunAgentassigns that fragment toAgentState.Fragmentbefore firing the completion callback,AgentState.Fragment.Status.CumulativeUsageis populated per sub-agent for embedders to report.Known limitation
The bundled stream clients don't yet populate
StreamEvent.Usageon the done event, so streaming-path tokens read as zero until they request usage from the API (e.g.StreamOptions{IncludeUsage: true}). The non-streaming path is fully counted. This is documented inusage_counter.go.Test Plan
usage_counter_internal_test.go: decorator sums bothAsk+CreateChatCompletion;newCountingLLMyields aStreamingLLMiff the inner LLM is one.tools_cumulative_test.go: a multi-callExecuteToolsrun reportsCumulativeUsageequal to the summed dispensed usage and greater thanLastUsage.TestAskWithStreaming*) passes — the wrap preserves streaming.🤖 Generated with Claude Code