fix(rollout): compact_logprobs + logprobs:0 + server-side text by aoshen02 · Pull Request #247 · vllm-project/vime

aoshen02 · 2026-06-14T03:11:59Z

Summary

logprobs:0: Change logprobs: 1 → logprobs: 0 in _build_inference_sampling_params. The previous value caused vLLM to compute and serialize per-token log probabilities that no downstream consumer reads, adding ~13.6% serialization overhead (measured via bare-serve A/B bench).
compact_logprobs: Parse the compact_logprobs response field (list of [logprob, token_id] pairs) when present, avoiding the deeply nested logprobs.content[i].logprob object traversal.
Server-side text: Prefer choice.text returned by the engine over client-side tokenizer.decode(). Falls back to asyncio.to_thread(decode) to avoid blocking the event loop when the server doesn't return text.

Test plan

FP8-30B-A3B rollout smoke (GB300): 0 NaN, throughput ≥ 160 tok/s/gpu
bf16 30B-A3B rollout: serialization gap closed vs bare-serve baseline

🤖 Generated with Claude Code

Three rollout performance/correctness fixes: 1. Request logprobs:0 instead of logprobs:1 — the per-token logprob over-fetch added ~13.6% serialization overhead with no consumer. 2. Parse compact_logprobs response field when present, avoiding the nested logprobs.content[].logprob object traversal. 3. Prefer server-returned text over client-side tokenizer.decode(), offloading decode work to the engine and using asyncio.to_thread as fallback to avoid blocking the event loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request updates vime/rollout/vllm_rollout.py to support parsing compact_logprobs and text directly from the vLLM response, falling back to standard token_ids and logprobs parsing and asynchronous token decoding if they are not present. Additionally, the default logprobs in sampling parameters is changed from 1 to 0. The reviewer suggests wrapping the compact_logprobs parsing logic in a try-except block to prevent potential crashes from malformed server responses.

gemini-code-assist · 2026-06-14T03:12:44Z

+    clp = choice.get("compact_logprobs")
+    if isinstance(clp, list) and clp:
+        new_response_tokens = [int(pair[1]) for pair in clp]
+        new_response_log_probs = [float(pair[0]) for pair in clp]
+    else:
+        new_response_tokens = choice.get("token_ids") or []
+        new_response_log_probs: list[float] = []
+        lp = choice.get("logprobs")
+        if isinstance(lp, dict):
+            content_items = lp.get("content") or []
+            new_response_log_probs = [
+                float(item.get("logprob", 0.0)) if isinstance(item, dict) else 0.0 for item in content_items
+            ]
+        if not new_response_log_probs:
+            new_response_log_probs = [0.0] * len(new_response_tokens)


Parsing compact_logprobs directly without a try-except block is risky. If the server returns a malformed list, elements that are not pairs, or values that cannot be cast to int or float, it will raise an exception (such as IndexError, TypeError, or ValueError) and crash the entire rollout process. Wrapping the parsing logic in a try-except block and falling back to the standard token_ids and logprobs parsing ensures robustness.

clp = choice.get("compact_logprobs") parsed_clp = False if isinstance(clp, list) and clp: try: new_response_tokens = [int(pair[1]) for pair in clp] new_response_log_probs = [float(pair[0]) for pair in clp] parsed_clp = True except (IndexError, TypeError, ValueError): pass if not parsed_clp: new_response_tokens = choice.get("token_ids") or [] new_response_log_probs: list[float] = [] lp = choice.get("logprobs") if isinstance(lp, dict): content_items = lp.get("content") or [] new_response_log_probs = [ float(item.get("logprob", 0.0)) if isinstance(item, dict) else 0.0 for item in content_items ] if not new_response_log_probs: new_response_log_probs = [0.0] * len(new_response_tokens)

read-the-docs-community · 2026-06-14T03:12:58Z

Documentation build overview

📚 vime | 🛠️ Build #33129925 | 📁 Comparing 2152472 against latest (fa0b6e9)

🔍 Preview build

26 files changed · ± 26 modified

± Modified

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

aoshen02 marked this pull request as draft June 15, 2026 02:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(rollout): compact_logprobs + logprobs:0 + server-side text#247

fix(rollout): compact_logprobs + logprobs:0 + server-side text#247
aoshen02 wants to merge 1 commit into
mainfrom
fix/rollout-compact-logprobs

aoshen02 commented Jun 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

read-the-docs-community Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aoshen02 commented Jun 14, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

read-the-docs-community Bot commented Jun 14, 2026

Documentation build overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant