vllm_dissag: unify 3 launchers into one two-axis driver + models.yaml by raviguptaamd · Pull Request #171 · ROCm/MAD

raviguptaamd · 2026-06-24T04:08:28Z

Summary

Two things, stacked:

Consolidates the three vLLM disaggregated-inference launchers in scripts/vllm_dissag/
(vllm_disagg_server.sh, vllm_disagg_mori_ep.sh, vllm_disagg_server_deepep.sh) into a single
launcher vllm_disagg.sh driven by two orthogonal axes + a validated EP backend.
Ports the MAD-private #324 DeepSeek-V3 MoRI-EP recipe into that consolidated format (connector +
models.yaml + Dockerfile), so MoE serves through the same launcher — not a parallel script.

Axis model

	`WIDE_EP=0` (TP)	`WIDE_EP=1` (wide expert-parallel)
`CONNECTOR=rixl` (NixlConnector)	dense / TP	DeepEP (`EP_BACKEND=deepep`)
`CONNECTOR=moriio` (MoRIIOConnector)	MoRIIO + TP (new)	MoRI-EP (`EP_BACKEND=mori`)

Connector-specific logic → connectors/{rixl,moriio}.sh
TP-vs-wideEP fork → parallelism.sh
Per-model CLI flags + env overrides → models.yaml catalog (replaces inline declare -A maps / hardcoding)

Adding a model is data-only (edit models.yaml + the slurm allowlist). Invalid connector/EP pairings
(moriio+deepep, rixl+mori) abort with a clear error.

Code changes

Launcher consolidation

New vllm_disagg.sh — single driver: axis resolution + validation, topology math, models.yaml
parse, NODE_RANK role branch (prefill/decode × master/child), container barrier, proxy, benchmark,
cleanup. DRY_RUN=1 echoes the assembled vllm serve argv for offline parity.
New connectors/rixl.sh (NixlConnector: TP + DeepEP) and connectors/moriio.sh
(MoRIIOConnector: MoRIIO+TP + MoRI-EP). Each implements the connector hook contract
(connector_init, connector_setup_env, connector_runtime_patch, connector_launch_worker,
connector_wait_workers_ready, connector_start_proxy).
New parallelism.sh — shared TP-vs-wideEP helpers.
New models.yaml — per-model flag/env catalog (sglang schema; tp/dp blocks selected by WIDE_EP).
Deleted the 3 legacy launchers.
New tests/parity_check.sh + tests/golden/ — dry-run argv parity gate (byte-identical to the
3 legacy launchers for every connector × parallelism × role cell, plus validation-rejection checks).
New ARCHITECTURE.md — component + state diagrams. New tests/TEST_PLAN.md.

#324 DeepSeek-V3 MoRI-EP port

connectors/moriio.sh: per-role mori all2all (mori_high_throughput/mori_low_latency);
--block-size ${KV_BLOCK_SIZE}, --kv-cache-memory-bytes, VLLM_ROCM_USE_AITER_MLA override;
MoRI fabric env (MORI_RDMA_TC/SL, MORI_SHMEM_HEAP_SIZE); cudagraph NONE emitted as
compilation-config cudagraph_mode:NONE +quant_fp8 (never bare --enforce-eager, which crashes
engine init on these AITER images); per-role PREFILL/DECODE_CUDAGRAPH_MODE.
Runtime patch: switched to the idempotent Python patcher apply_39276_rebased.py (new) with
topology-defaulted SKIP_RUNTIME_PATCH (explicit value wins; images that bake the fixes set =1).
Proxy: connector_start_proxy supports vllm_router (ROUTER_BINARY/PATH + --kv-connector moriio + discovery address + registration gate) and moriio_toy (with online_serving/ →
disaggregated/ path resolution). moriio defaults to vllm_router — the toy proxy cannot route
the wideEP DP-rank KV-notify. Sets ROUTER_PORT (default 30000) so the router gets a valid --port.
models.yaml: DeepSeek-V3 / R1 / V3-5layer env: blocks (YAML anchor) encode the validated recipe
(block=16, MLA off, kv-cache-memory-bytes, per-role cudagraph + mori backends, SKIP_RUNTIME_PATCH=1,
fabric tuning), so MODEL_NAME=DeepSeek-V3 WIDE_EP=1 works without a wrapper.
New docker/vllm_disagg_mori_ep_fullsource.ubuntu.amd.Dockerfile — buildable MoRI-EP image
(MoRI 1db01d8, AITER 0.1.14, vLLM fork b10a9f7a, vllm-router #181 + DP-rank dpfix) on a named
nightly base, with build-time integrity asserts. The public-base vllm_disagg_inference Dockerfile is
kept unchanged for dense models.
New helper scripts benchmark_long_context.sh, benchmark_parser.py.
run_xPyD_models.slurm: BENCHMARK_SCRIPT selector (sweep/long_context); -e plumbing for the
recipe/cache/proxy env; cache-dir env forwarded only-if-set so a prewarmed image's baked cache wins;
driver runs $BENCHMARK_SCRIPT_FILE; apply_39276_rebased.py added to REQUIRED_FILES.

Back-compat

RUN_MORI=1 / RUN_DEEPEP=1 still work (mapped to the new axes); no-flags default stays rixl + TP.

Testing

tests/parity_check.sh — byte-identical argv vs the 3 legacy launchers (offline, no GPUs).
moriio wideEP argv verified byte-identical to #324's launcher.

Notes / follow-ups

MoE models require a co-versioned AITER/vLLM image (the fullsource Dockerfile); on a mismatched image
they fail inside AITER's MoE GEMM path at engine init — an image concern, independent of the launcher.
The fullsource base is a private rocm/pytorch-private nightly (BASE_IMAGE overridable).
VLLM_CACHE_PERSIST (image-digest-keyed host JIT cache, to avoid per-run cold AITER/MoRI compiles)
is a planned follow-up, not in this PR.
#325 Hy3 (GQA) support is deferred.

🤖 Generated with Claude Code

Consolidate vllm_disagg_server.sh / vllm_disagg_mori_ep.sh / vllm_disagg_server_deepep.sh into a single launcher (vllm_disagg.sh) that composes behavior from two orthogonal axes plus a validated EP backend: CONNECTOR = rixl (NixlConnector) | moriio (MoRIIOConnector) WIDE_EP = 0 (TP) | 1 (wide expert-parallel) EP_BACKEND= mori | deepep (only when WIDE_EP=1; validated vs connector) Connector-specific logic lives in connectors/{rixl,moriio}.sh; the TP-vs-wideEP fork lives in parallelism.sh; per-model CLI flags + env overrides move from inline declare -A maps / hardcoding into a models.yaml catalog. This adds a new capability (MoRIIO + TP) and lets a model be added by editing data, not code. Back-compat: RUN_MORI=1 / RUN_DEEPEP=1 still work (mapped to the new axes); the no-flags default stays rixl + TP. run_xPyD_models.slurm resolves the axes, keeps the VALID_MODELS allowlists, and plumbs the new env via docker -e. Testing: tests/parity_check.sh drives the real launcher under DRY_RUN=1 and diffs the assembled `vllm serve` argv against golden fixtures for every connector x parallelism x role cell (byte-identical to the 3 legacy launchers), plus validation-rejection checks. Live-validated serving for dense models (amd-Llama-3.3-70B, Qwen3-32B) on the MoRIIO+TP path. See tests/TEST_PLAN.md. Catalog seeded with the existing 6 models + Qwen3-32B (dense) and Qwen3-30B-A3B. Co-Authored-By: Claude <noreply@anthropic.com>

Mermaid diagrams documenting the unified launcher: component architecture (sbatch → driver → connector → vllm serve), axis resolution flow + 2x2 capability matrix, per-node runtime state machine, driver↔connector hook sequence, and the per-model config/env layering. Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull request overview

This PR consolidates the vLLM disaggregated prefill/decode launch flow in scripts/vllm_dissag/ by replacing three legacy launcher scripts with a single two-axis driver (CONNECTOR × WIDE_EP) plus a models.yaml catalog for per-model flags/env, and adds an offline parity test gate to keep the assembled vllm serve argv consistent with legacy behavior.

Changes:

Replace vllm_disagg_{server,mori_ep,server_deepep}.sh with a unified vllm_disagg.sh that sources connector + parallelism profiles and reads per-model config from models.yaml.
Update run_xPyD_models.slurm to resolve axes/back-compat shims and plumb new env vars into Docker runs.
Add tests/parity_check.sh + golden fixtures to enforce byte-identical argv parity vs legacy launchers (plus validation rejection checks).

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
scripts/vllm_dissag/vllm_disagg.sh	New unified launcher with axis resolution, role branching, YAML model config parsing, and DRY_RUN support.
scripts/vllm_dissag/connectors/rixl.sh	New connector profile implementing rixl TP + DeepEP wideEP behavior.
scripts/vllm_dissag/connectors/moriio.sh	New connector profile implementing MoRI-EP wideEP and new MoRIIO+TP path.
scripts/vllm_dissag/parallelism.sh	Shared helper for wideEP master/child role args.
scripts/vllm_dissag/models.yaml	New model catalog replacing inline associative arrays / hardcoding.
scripts/vllm_dissag/tests/TEST_PLAN.md	New before/after validation plan documenting parity + live matrix.
scripts/vllm_dissag/tests/parity_check.sh	New offline parity gate that diffs DRY_RUN argv vs goldens.
scripts/vllm_dissag/tests/golden/*.txt	New golden argv fixtures captured from legacy launchers.
scripts/vllm_dissag/tests/golden/gen_golden.sh	Utility to regenerate golden fixtures.
scripts/vllm_dissag/run_xPyD_models.slurm	Updated slurm entry to use unified launcher and axis/back-compat resolution.
scripts/vllm_dissag/README.MD	Updated docs to match the unified launcher + models.yaml + test flow.
scripts/vllm_dissag/vllm_disagg_server.sh	Deleted legacy rixl TP launcher.
scripts/vllm_dissag/vllm_disagg_mori_ep.sh	Deleted legacy MoRI-EP launcher.
scripts/vllm_dissag/vllm_disagg_server_deepep.sh	Deleted legacy DeepEP launcher.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#   CONNECTOR = rixl | moriio          (KV transfer; default moriio)
+#   WIDE_EP   = 0 (TP) | 1 (wideEP)    (parallelism; default per back-compat shim)


+SCRIPT_DIR="${NIXL_COOKBOOK_PATH:-$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" && pwd)}"
+


+echo "Listing NIXL_COOKBOOK_PATH : "
+ls ${NIXL_COOKBOOK_PATH}


+                    --no-enable-prefix-caching \
+                    --all2all-backend "${_all2all}" \
+                    --trust-remote-code \
+                    --distributed-timeout-seconds "${DISTRIBUTED_TIMEOUT_SECONDS:-7200}" \
+                    "${exec_args[@]}" "${extra_args[@]}" "${kv_args[@]}"


+            "${exec_args[@]}" \
+            "${extra_args[@]}" \
+            "${kv_args[@]}" \
+            2>&1 | tee /run_logs/${SLURM_JOB_ID}/${log_prefix}_NODE${NODE_RANK}.log >/dev/null &


+                --all2all-backend "${backend}" \
+                ${DBO_ARGS} \
+                "${extra_args[@]}" \
+                --kv-transfer-config "${kv_config}"


+        --all2all-backend "${backend}" \
+        ${DBO_ARGS} \
+        "${extra_args[@]}" \
+        --kv-transfer-config "${kv_config}" \
+        2>&1 | tee /run_logs/${SLURM_JOB_ID}/${log_prefix}_NODE${NODE_RANK}.log >/dev/null &


+# BOUNDARY (do NOT put these here — the launcher/connector owns them):
+#   --tensor-parallel-size / --data-parallel-size / --enable-expert-parallel /
+#   --all2all-backend / --kv-transfer-config / --port / transfer backend.


-export BENCHMARK_CON="8 16 32"
-export BENCHMARK_COMBINATIONS="1024/1024 8192/1024"
-sbatch -N 2 -n 2 run_xPyD_models.slurm
+python3 benchmark_parser.py <log_path>/benchmark_XXX_CONCURRENCY.log


+MASTER_PORT="${MASTER_PORT:-23731}"
+NODE_RANK="${NODE_RANK:-0}"
+NNODES="${NNODES:-1}"
+MODEL_PATH=$MODEL_PATH


…uncher Brings MAD-private #324's validated DeepSeek-V3 wide-EP support into the consolidated format (driver + connectors + models.yaml), so MoE serves through vllm_disagg.sh — not as a parallel launcher. connectors/moriio.sh (parity with #324's mori launcher): - runtime patch: use the idempotent Python patcher apply_39276_rebased.py with topology-defaulted SKIP_RUNTIME_PATCH (explicit wins); the old bash patcher double-patched a pre-patched image and corrupted the decode notify path. - cudagraph: NEVER bare --enforce-eager (it routes fp8 quant through an AITER op whose aiter_tensor_t signature crashes engine init); NONE -> compilation-config cudagraph_mode:NONE +quant_fp8. Per-role PREFILL/DECODE_CUDAGRAPH_MODE. - proxy: vllm_router (ROUTER_BINARY/PATH + --kv-connector moriio + discovery + registration gate) and moriio_toy with online_serving/->disaggregated/ path resolution. - recipe knobs: --block-size ${KV_BLOCK_SIZE}, --kv-cache-memory-bytes (skip the buggy profiling forward), per-role mori_high_throughput/mori_low_latency, VLLM_ROCM_USE_AITER_MLA override, MORI_RDMA_TC/SL + MORI_SHMEM_HEAP_SIZE. models.yaml: DeepSeek-V3/R1/V3-5layer env: blocks encode the full recipe (YAML anchor) so MODEL_NAME=DeepSeek-V3 WIDE_EP=1 "just works" (= #324 .env auto-source). slurm: BENCHMARK_SCRIPT selector (sweep/long_context); plumb the recipe + cache + benchmark env via docker -e; driver runs $BENCHMARK_SCRIPT_FILE. docker: add vllm_disagg_mori_ep_fullsource (the buildable validated stack: MoRI 1db01d8, AITER 0.1.14, vLLM fork b10a9f7a, router #181) for DeepSeek MoE; the public-base vllm_disagg_inference stays for dense. Verified: moriio wideEP `vllm serve` argv is byte-identical to #324's launcher (offline gate); rixl/deepep/dense parity unchanged; goldens regenerated. Co-Authored-By: Claude <noreply@anthropic.com>

The moriio toy proxy cannot route the wideEP DP-rank KV-notify -> decode hangs ("remote blocks never arrived", deferred-write expiry). vllm_router carries the DP-rank dpfix and is what #324's recipe.env uses. Default moriio to vllm_router (matching rixl); moriio_toy still selectable. Golden regenerated (kv-transfer proxy_port 10001->30000); argv stays byte-identical to #324 (router on both sides). Co-Authored-By: Claude <noreply@anthropic.com>

raviguptaamd requested review from amathews-amd and gargrahul as code owners June 24, 2026 04:08

Copilot AI review requested due to automatic review settings June 24, 2026 04:08

Copilot started reviewing on behalf of raviguptaamd June 24, 2026 04:09 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

raviguptaamd and others added 2 commits June 24, 2026 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm_dissag: unify 3 launchers into one two-axis driver + models.yaml#171

vllm_dissag: unify 3 launchers into one two-axis driver + models.yaml#171
raviguptaamd wants to merge 4 commits into
ROCm:developfrom
raviguptaamd:vllm-disagg-unified-launcher

raviguptaamd commented Jun 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# CONNECTOR = rixl \| moriio (KV transfer; default moriio)
		# WIDE_EP = 0 (TP) \| 1 (wideEP) (parallelism; default per back-compat shim)

		SCRIPT_DIR="${NIXL_COOKBOOK_PATH:-$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" && pwd)}"

		echo "Listing NIXL_COOKBOOK_PATH : "
		ls ${NIXL_COOKBOOK_PATH}

Uh oh!

Conversation

raviguptaamd commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Axis model

Code changes

Launcher consolidation

#324 DeepSeek-V3 MoRI-EP port

Back-compat

Testing

Notes / follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raviguptaamd commented Jun 24, 2026 •

edited

Loading