Add DeepSeek-V4-Pro 8k/1k SA recipes for GB300 (MTP-off + MTP-on, MXFP4) by nv-yna · Pull Request #192 · NVIDIA/srt-slurm

nv-yna · 2026-06-02T19:35:04Z

Summary

26 srt-slurm recipes for the DeepSeek-V4-Pro 8k/1k disagg SA reproduction sweep on GB300. Engine configs are byte-translated from the executed-sweep ctx_config.yaml + gen_config.yaml — the Dynamo (dynamo.frontend + dynamo.trtllm) runs that produced the published 8k/1k pareto. One recipe per frontier operating point, across 5 decode topologies × 3 MTP modes:

MTP=Off (13): TEP8 c=4; TEP4 c=5/15/25/55; DEP32 c=154/308/615/1127; DEP16 c=1229/2253; DEP8 c=2253/4301
MTP=3 (11): TEP8 c=8; TEP4 c=10/15/30; DEP32 c=84/180/333/615; DEP16 c=666/1229; DEP8 c=1229
MTP=1 (2): DEP8 c=2253/4301

Precision = MXFP4 (verified from the checkpoint tensor dtypes: MoE experts FP4 with E8M0 block-32 scales, dense UE8M0-FP8, kv-cache fp8) — hence the gb300_mxfp4/ dir (not nvfp4). Container pinned to the 1.3.0rc15.post1 image-index digest.

How these were produced (methodology): the author's trtllm-serve SA framework was run with only the server swapped to Dynamo (DISAGG_BACKEND=dynamo) — identical allocation, /raid staging, run_benchmark.sh multi-concurrency client, and get_disagg_e2e_metrics.py metric (incl. the TTFT<5000ms frontier selection). Result: Dynamo reproduces the trtllm-serve spreadsheet within ±1.6% (MTP-off) / ±3.5% (MTP-on) on both tput_per_user and tput_per_gpu.

Same four srt-slurm default overrides as #131 (dynamo.install:false, frontend.enable_multiple_frontends:false, benchmark.num_prompts_mult:1, benchmark.use_chat_template:false).

Worker env reflects the executed 8k/1k sweep (UCX_TLS, TRTLLM_ENABLE_PDL, *_DISABLE_GC, NCCL_GRAPH_MIXING_SUPPORT=0, MIMALLOC_PURGE_DELAY=0; ctx adds PYTORCH_CUDA_ALLOC_CONF=expandable_segments). This differs from #131's GB200 set — no UCX_CUDA_IPC_ENABLE_MNNVL/UCX_RNDV_SCHEME/TRTLLM_KVCACHE_HOST_SIZE_OVERRIDE, and (e2e mode, not gen-only) no per-c TLLM_BENCHMARK_REQ_QUEUES_SIZE.

Review notes: (1) model.path is the cmh checkpoint path (/lustre/fsw/portfolios/coreai/projects/coreai_comparch_inferencex/models/dsv4-pro) — override per cluster. (2) precision dir is mxfp4 (accurate), unlike the existing DSV4-Pro gb200_nvfp4; happy to rename for consistency if the team prefers.

Test plan

Sweep executed end-to-end (26/26 frontier points) and pareto curve published (Dynamo vs trtllm-serve)
srtctl dry-run validates representative recipes — DEP32 c=154 (9-node), TEP4 c=5 (6-node), DEP8 c=4301_mtp1 (14-node), DEP16 c=1229_mtp3 (14-node); node counts + env resolve correctly
Rerun a subset from these committed recipes to confirm reproducibility

26 frontier operating points (13 MTP-off, 11 MTP3, 2 MTP1) across TEP4/TEP8/DEP8/DEP16/DEP32, byte-translated from the executed Dynamo 8k/1k sweep. Precision = MXFP4 experts + UE8M0-FP8 dense. Produced by running the trtllm-serve SA framework with only the server swapped to Dynamo; reproduces the published spreadsheet within +/-1.6% (MTP-off) / +/-3.5% (MTP-on) on tput_per_user and tput_per_gpu. Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>

codecov-commenter · 2026-06-02T19:36:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (sa-submission-q2-2026@9ecc31f). Learn more about missing BASE report.

Additional details and impacted files

@@                   Coverage Diff                    @@
##             sa-submission-q2-2026     #192   +/-   ##
========================================================
  Coverage                         ?   61.51%           
========================================================
  Files                            ?       48           
  Lines                            ?     4176           
  Branches                         ?        0           
========================================================
  Hits                             ?     2569           
  Misses                           ?     1607           
  Partials                         ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

richardhuo-nv · 2026-06-02T20:31:36Z

Also let's use the folder structure like this:
https://github.com/NVIDIA/srt-slurm/tree/sa-submission-q2-2026/recipes/kimi2.5/trtllm_dynamo/disagg/gb200Nvfp4/ISL1K_OSL1K/MTP

make hardware/percision/islosl/MTP/STP into the path and put the configs into the pathes that belongs to.

richardhuo-nv · 2026-06-02T20:26:29Z

+# default overrides (dynamo.install, multi_frontend, num_prompts_mult, use_chat_template).
+name: dsv4_pro_mxfp4_ISL8K_OSL1K_dep16_c1229_eplb384_mtp0
+model:
+  path: /lustre/fsw/portfolios/coreai/projects/coreai_comparch_inferencex/models/dsv4-pro


You can just put the huggingface model id for now. This need to point to the model path in sa cluster.

richardhuo-nv · 2026-06-02T20:26:58Z

+name: dsv4_pro_mxfp4_ISL8K_OSL1K_dep16_c1229_eplb384_mtp0
+model:
+  path: /lustre/fsw/portfolios/coreai/projects/coreai_comparch_inferencex/models/dsv4-pro
+  container: nvcr.io/nvstaging/ai-dynamo/tensorrtllm-runtime@sha256:6aa381ac47bf7f5d0ef0598a1cab97dc0005e01c41da104f420966373d9a09e4


Let's kick off a release and put real released dynamo image.

richardhuo-nv · 2026-06-02T20:27:29Z

+      max_seq_len: 9256
+      moe_config:
+        backend: MEGAMOE_DEEPGEMM
+        load_balancer: /scratch/fsw/portfolios/coreai/projects/coreai_comparch_inferencex/users/yna/dsv4-exp/framework_dynamo/deepseek-V4-Pro/offline_eplb_confs/moe_load_balancer_gen_ep16_slots384.yaml


you will need to make sure this file is part of the container mount.

I would suggest put the file as part of the config folder in the srt-slurm. And mount the config folder as the container mount.

richardhuo-nv · 2026-06-02T20:28:39Z

+  osl: 1024
+  concurrencies: '1229'
+  req_rate: inf
+  num_prompts_mult: 1


are we sure the mult is just 1? Will the perf number even stablize? in the past, we normally do 10 or 16.

richardhuo-nv · 2026-06-02T20:50:16Z

+  concurrencies: '1229'
+  req_rate: inf
+  num_prompts_mult: 1
+  use_chat_template: false


for MTP, use chat template is a must have AFAIK, please double confirm.

richardhuo-nv reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DeepSeek-V4-Pro 8k/1k SA recipes for GB300 (MTP-off + MTP-on, MXFP4)#192

Add DeepSeek-V4-Pro 8k/1k SA recipes for GB300 (MTP-off + MTP-on, MXFP4)#192
nv-yna wants to merge 1 commit into
NVIDIA:sa-submission-q2-2026from
nv-yna:yna/dsv4-trt-8k1k-stp

nv-yna commented Jun 2, 2026

Uh oh!

codecov-commenter commented Jun 2, 2026

Uh oh!

richardhuo-nv commented Jun 2, 2026

Uh oh!

richardhuo-nv Jun 2, 2026

Uh oh!

richardhuo-nv Jun 2, 2026

Uh oh!

richardhuo-nv Jun 2, 2026

Uh oh!

richardhuo-nv Jun 2, 2026

Uh oh!

richardhuo-nv Jun 2, 2026 •

edited

Loading

Uh oh!

richardhuo-nv Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

nv-yna commented Jun 2, 2026

Summary

Test plan

Uh oh!

codecov-commenter commented Jun 2, 2026

Codecov Report

Uh oh!

richardhuo-nv commented Jun 2, 2026

Uh oh!

richardhuo-nv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

richardhuo-nv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

richardhuo-nv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

richardhuo-nv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

richardhuo-nv Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardhuo-nv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

richardhuo-nv Jun 2, 2026 •

edited

Loading