Skip to content

fix(rollout): Resume vLLM workers before draining in abort#294

Open
Josephasafg wants to merge 3 commits into
vllm-project:mainfrom
Josephasafg:fix_rollout_abort_resume_before_drain
Open

fix(rollout): Resume vLLM workers before draining in abort#294
Josephasafg wants to merge 3 commits into
vllm-project:mainfrom
Josephasafg:fix_rollout_abort_resume_before_drain

Conversation

@Josephasafg

Copy link
Copy Markdown

abort() ordered the worker calls as pause → drain → resume, which can hang the rollout under --partial-rollout with continuously-submitting (e.g. multi-turn) rollouts.

The call to /pause?mode=abort aborts the known requests but leaves the engine inPAUSED_NEW status, so a /generate POST that races in after the pause ends up in the waiting queue and never returns. One such request blocks the while state.pendings drain forever - and /resume, the only thing that frees it, sat after the drain.

The fix here is to reorder to pause → resume → drain so the parked requests get scheduled and return, letting the drain complete. Tasks still queued on the semaphore short-circuit on state.aborted, so nothing new is generated.

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Josephasafg <ajgard7@gmail.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the abort logic in vime/rollout/vllm_rollout.py to resume workers before draining pending tasks, which prevents potential hangs from raced requests. It also introduces a helper function _broadcast_to_workers and adds corresponding unit tests. The reviewer suggested utilizing the newly introduced _broadcast_to_workers helper function within the abort function to reduce code duplication for the pause and resume operations.

Comment thread vime/rollout/vllm_rollout.py Outdated
@read-the-docs-community

read-the-docs-community Bot commented Jun 24, 2026

Copy link
Copy Markdown

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant