Skip to content

diag: DYN_ENABLE_FAST_CANCELLATION — flag-gated revert of #7489 (deferred cancel)#2

Closed
nv-yna wants to merge 1 commit into
yna/dsv4-faultdetect-fixfrom
yna/dsv4-fastcancel-revert
Closed

diag: DYN_ENABLE_FAST_CANCELLATION — flag-gated revert of #7489 (deferred cancel)#2
nv-yna wants to merge 1 commit into
yna/dsv4-faultdetect-fixfrom
yna/dsv4-fastcancel-revert

Conversation

@nv-yna

@nv-yna nv-yna commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Adds DYN_ENABLE_FAST_CANCELLATION — a flag-gated, faithful revert of ai-dynamo#7489
("fix: prevent KV block leak from cancel during disagg KV transfer"). Base = yna/dsv4-faultdetect-fix, so this
PR's diff is exactly the one flag commit (2 files) for easy review.

What it does

When DYN_ENABLE_FAST_CANCELLATION is set, all three of ai-dynamo#7489's changes are reverted to pre-ai-dynamo#7489 "fast
cancellation"; default (unset) = current ai-dynamo#7489 behavior.

  • prefill_router/mod.rs (3 prefill-context sites): re-link prefill as a child of the engine context, so a
    client cancel/kill propagates and interrupts the in-flight prefill + NIXL KV transfer.
  • prefill_router/mod.rs (decode-entry guard): abort decode routing when the context is killed (restores the
    removed early-return).
  • trtllm/request_handlers/handler_base.py: bypass the _DeferredAbort wrapper, so abort fires immediately
    instead of deferring until the first decode token ("first result = KV transfer complete").

Why (diagnostic)

Tests whether the disagg prefill→decode handoff deferred-cancellation / slot-residency is the DSV4 throughput
lock-in. INCLUDE_TRANSFER_LOAD=0 was already refuted (native trtllm-serve shares that flag and is stable);
ai-dynamo#7489 is a Dynamo-only code path native lacks, making it the leading hypothesis.

⚠️ Diagnostic flag only — NOT a shippable default. Fast cancellation reintroduces the KV-block-leak risk
that ai-dynamo#7489 fixed. Verified: cargo check clean + compiles in the aarch64 manylinux image build.

…cancel (diagnostic)

Gates a faithful revert of PR ai-dynamo#7489 ("prevent KV block leak from cancel during
disagg KV transfer") behind the DYN_ENABLE_FAST_CANCELLATION env var. When set:
- prefill is re-linked as a child of the engine context (prefill_router/mod.rs,
  3 sites) so a client cancel/kill propagates and interrupts the in-flight
  prefill + NIXL KV transfer (fast cancellation);
- decode routing aborts when the context is killed (prefill_router/mod.rs);
- the decode handler bypasses _DeferredAbort so abort fires immediately instead
  of deferring until the first decode token (trtllm/handler_base.py).

Default off = current (ai-dynamo#7489) behavior. Diagnostic flag only: fast cancellation
reintroduces the KV-block-leak risk ai-dynamo#7489 fixed. Used to test whether the disagg
prefill->decode handoff deferred-cancellation is the DSV4 throughput lock-in.

Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant