diag: DYN_ENABLE_FAST_CANCELLATION — flag-gated revert of #7489 (deferred cancel)#2
Closed
nv-yna wants to merge 1 commit into
Closed
diag: DYN_ENABLE_FAST_CANCELLATION — flag-gated revert of #7489 (deferred cancel)#2nv-yna wants to merge 1 commit into
nv-yna wants to merge 1 commit into
Conversation
…cancel (diagnostic) Gates a faithful revert of PR ai-dynamo#7489 ("prevent KV block leak from cancel during disagg KV transfer") behind the DYN_ENABLE_FAST_CANCELLATION env var. When set: - prefill is re-linked as a child of the engine context (prefill_router/mod.rs, 3 sites) so a client cancel/kill propagates and interrupts the in-flight prefill + NIXL KV transfer (fast cancellation); - decode routing aborts when the context is killed (prefill_router/mod.rs); - the decode handler bypasses _DeferredAbort so abort fires immediately instead of deferring until the first decode token (trtllm/handler_base.py). Default off = current (ai-dynamo#7489) behavior. Diagnostic flag only: fast cancellation reintroduces the KV-block-leak risk ai-dynamo#7489 fixed. Used to test whether the disagg prefill->decode handoff deferred-cancellation is the DSV4 throughput lock-in. Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
DYN_ENABLE_FAST_CANCELLATION— a flag-gated, faithful revert of ai-dynamo#7489("fix: prevent KV block leak from cancel during disagg KV transfer"). Base =
yna/dsv4-faultdetect-fix, so thisPR's diff is exactly the one flag commit (2 files) for easy review.
What it does
When
DYN_ENABLE_FAST_CANCELLATIONis set, all three of ai-dynamo#7489's changes are reverted to pre-ai-dynamo#7489 "fastcancellation"; default (unset) = current ai-dynamo#7489 behavior.
prefill_router/mod.rs(3 prefill-context sites): re-link prefill as a child of the engine context, so aclient cancel/kill propagates and interrupts the in-flight prefill + NIXL KV transfer.
prefill_router/mod.rs(decode-entry guard): abort decode routing when the context is killed (restores theremoved early-return).
trtllm/request_handlers/handler_base.py: bypass the_DeferredAbortwrapper, so abort fires immediatelyinstead of deferring until the first decode token ("first result = KV transfer complete").
Why (diagnostic)
Tests whether the disagg prefill→decode handoff deferred-cancellation / slot-residency is the DSV4 throughput
lock-in.
INCLUDE_TRANSFER_LOAD=0was already refuted (native trtllm-serve shares that flag and is stable);ai-dynamo#7489 is a Dynamo-only code path native lacks, making it the leading hypothesis.
that ai-dynamo#7489 fixed. Verified:
cargo checkclean + compiles in the aarch64 manylinux image build.