Skip to content

feat: expose cumulative token usage per ExecuteTools run#55

Merged
mudler merged 4 commits into
mainfrom
feat/cumulative-usage
Jun 4, 2026
Merged

feat: expose cumulative token usage per ExecuteTools run#55
mudler merged 4 commits into
mainfrom
feat/cumulative-usage

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Summary

Adds a way to read the cumulative token usage of a whole ExecuteTools run (and therefore of each spawned sub-agent), where previously only the last LLM call's usage was retained on Status.LastUsage.

  • New Status.CumulativeUsage LLMUsage field, summed across every LLM call in a run.
  • A small counting LLM decorator (newCountingLLM) that accumulates usage from both CreateChatCompletion and Ask. It preserves StreamingLLM so wrapping never disables the streaming code path.
  • ExecuteTools wraps its llm once (after the sub-agent fallback agentLLM is captured, so a sub-agent's tokens are not folded into the parent) and stamps the total onto the returned fragment via a deferred named return — covering all exit paths.

Because each sub-agent runs its own ExecuteTools and runAgent assigns that fragment to AgentState.Fragment before firing the completion callback, AgentState.Fragment.Status.CumulativeUsage is populated per sub-agent for embedders to report.

Known limitation

The bundled stream clients don't yet populate StreamEvent.Usage on the done event, so streaming-path tokens read as zero until they request usage from the API (e.g. StreamOptions{IncludeUsage: true}). The non-streaming path is fully counted. This is documented in usage_counter.go.

Test Plan

  • usage_counter_internal_test.go: decorator sums both Ask + CreateChatCompletion; newCountingLLM yields a StreamingLLM iff the inner LLM is one.
  • tools_cumulative_test.go: a multi-call ExecuteTools run reports CumulativeUsage equal to the summed dispensed usage and greater than LastUsage.
  • Streaming regression (TestAskWithStreaming*) passes — the wrap preserves streaming.
  • Full non-e2e suite green. (One pre-existing background-spawn mock race is unrelated and present at the base commit.)

🤖 Generated with Claude Code

mudler and others added 4 commits June 2, 2026 21:14
Add an exported Background flag on AgentState, set true for background
spawns (spawn_agent background=true) and false for foreground ones. This
lets embedders tell unattended background work apart from a foreground
sub-agent whose result is consumed inline — e.g. to auto-notify on
completion only for background agents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a counting LLM decorator and a CumulativeUsage field on Status so a
full ExecuteTools run's token usage can be summed and exposed. Preserves
StreamingLLM so wrapping does not disable streaming.
Buffer the forwarded stream channel and make the send context-aware to
match the client convention and prevent goroutine leaks. Document that
streaming-path usage is unpopulated by the bundled clients today.
Wraps the run LLM in a counting decorator and stamps the summed usage onto
the returned fragment's Status.CumulativeUsage via a deferred named return,
covering all exit paths. Each sub-agent run reports its own total.
@mudler mudler merged commit fe7fd5d into main Jun 4, 2026
2 of 3 checks passed
mudler added a commit to mudler/nib that referenced this pull request Jun 4, 2026
…al replace

cogito's CumulativeUsage (mudler/cogito#55) is now released on main, so nib
depends on the published pseudo-version and the dev-time local replace is
removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mudler added a commit to mudler/nib that referenced this pull request Jun 4, 2026
…16)

* docs: design spec for sub-agent completion stats line

Spec for rendering a Claude-Code-style completion summary
(tools · cumulative tokens · elapsed) when a spawned sub-agent
finishes, in both the TUI and the plain CLI. Cumulative token
tracking requires a small cogito-side accumulator exposed on the
returned fragment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: implementation plan for sub-agent completion stats line

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* build: replace cogito with local worktree during dev

Temporary: lets nib build against the un-released CumulativeUsage change.
Removed once cogito is tagged (see plan Task 9).

* feat(chat): AgentEvent run-stats fields and formatters

* feat(chat): time sub-agents and populate completion run-stats

* feat(tui): show sub-agent run-stats on the completion marker

* feat(cli): show sub-agent run-stats on the completion line

* build: bump cogito to v0.10.1-0.20260604082319-fe7fd5de11d1; drop local replace

cogito's CumulativeUsage (mudler/cogito#55) is now released on main, so nib
depends on the published pseudo-version and the dev-time local replace is
removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants