Skip to content

Fix avg_decode calculation when decode_time is 0 for single tokens#13

Open
KhudaBuxMagsi wants to merge 1 commit into
baidu:mainfrom
KhudaBuxMagsi:fix-avg-decode-calculation
Open

Fix avg_decode calculation when decode_time is 0 for single tokens#13
KhudaBuxMagsi wants to merge 1 commit into
baidu:mainfrom
KhudaBuxMagsi:fix-avg-decode-calculation

Conversation

@KhudaBuxMagsi

Copy link
Copy Markdown

Description

In collect_stream_silent (Line 184), decode_time is explicitly set to 0 if token_count <= 1. However, during the metrics aggregation inside the run function, avg_decode was dividing the total sum by successful requests (which counts any request where tokens > 0).

This skewed the final Avg decode_time/request metric whenever single-token responses occurred, as their 0.0s decode times were included in the summation but still counted as full requests in the denominator.

Changes

  • Added successful_decode_reqs to explicitly count only requests where decode_time > 0.
  • Updated the avg_decode calculation to safely divide by successful_decode_reqs.
  • Kept avg_tokens dependent on the overall successful counter.

@rajpratham1 rajpratham1 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a focused bug fix that addresses an edge case in the reporting logic without changing the inference pipeline itself. Previously, requests that produced only a single token had decode_time = 0 but were still included in the average calculation, which could skew the reported metric. Excluding those entries from the average while handling the "no valid decode times" case is a sensible improvement.

@kushdab

kushdab commented Jun 26, 2026

Copy link
Copy Markdown

Good catch — the bug is real and the fix is correct.

What the bug was

avg_decode was computed as:

sum(r["decode_time"] for r in results if r["tokens"] > 0) / successful

successful counts every request with tokens > 0, which includes single-token responses where decode_time is explicitly set to 0.0 (by the token_count <= 1 branch in collect_stream_silent). Those zero-time entries dragged the average down without representing a real decode step.

The fix

Using a separate successful_decode_reqs counter — requests where decode_time > 0 — is exactly the right semantics: "average decode time, among requests that actually had a decode phase." The if successful_decode_reqs > 0 guard covers the edge case where every request in the batch returned a single token (prints 0.0s, which is accurate).

Minor notes

  • Line 334 adds a trailing space before the newline on the main() line, and the final line is missing a newline (\ No newline at end of file). Worth cleaning both up before merge.
  • When avg_decode = 0.0 is printed because all requests were single-token, the metric is slightly misleading at a glance. Optionally printing "N/A (all single-token)" for that case would improve clarity, but that is a polish item, not a correctness issue.

The core change is solid. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants