Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
f365372
remove vllm disagg for dpsr1 and dpv3
ichbinblau Apr 13, 2026
08f4c5b
consolidate amd_utils for sglang and vllm
ichbinblau Apr 21, 2026
98ad4f3
use vLLM router as default router for vllm disagg
ichbinblau Apr 21, 2026
1dbaad8
fix bugs
ichbinblau Apr 23, 2026
ee133d7
[AMD] Bump to nightly vllm and vllm-router images (#1208)
simondanielsson May 4, 2026
4940153
update vllm image and vllm router image
ichbinblau May 12, 2026
d100454
update the interface prefix for tw cluster
ichbinblau May 12, 2026
2fa7ee3
add deps for ib device auto-detection
ichbinblau May 13, 2026
9115482
update vllm image
ichbinblau May 13, 2026
784a5a0
fix indentation and add missing finally block in async_request_openai…
ichbinblau May 13, 2026
d4e1daf
fix tw-eth interface detection pattern in env.sh
ichbinblau May 13, 2026
e2d3a28
fix vllm-disagg config schema: use scenarios.fixed-seq-len
ichbinblau May 13, 2026
83e7554
fix vllm-disagg routing to multi_node benchmark subdir
ichbinblau May 13, 2026
3daec02
fix result collection to use FRAMEWORK as log directory prefix
ichbinblau May 13, 2026
51c92a7
suppress tokenizer warnings and debug output in bench.sh
ichbinblau May 14, 2026
3569b0a
fix vllm-disagg deadlock: stop router after rank 0 container exits
ichbinblau May 14, 2026
73d649a
reduce vllm-disagg concurrency sweep to single point for faster itera…
ichbinblau May 14, 2026
50864b4
preserve slurm logs on failure and print stderr inline
ichbinblau May 14, 2026
0454199
enable set -x around docker privilege detection for CI debugging
ichbinblau May 14, 2026
672e693
fix docker detection: test on compute node, not batch host
ichbinblau May 14, 2026
fb2d771
fix docker detection: per-node probe since group membership varies
ichbinblau May 14, 2026
cecd65a
add vllm-disagg changelog entries and update kimi conc-list
ichbinblau May 14, 2026
5959f8d
switch vllm-disagg to 8k1k config to trigger multi-node eval
ichbinblau May 14, 2026
f479f0d
add multi-node eval feature
ichbinblau May 15, 2026
7f80da7
remove start_etcd.sh
ichbinblau May 15, 2026
0238ad1
change decode to 1, easier for testing
ichbinblau May 15, 2026
5a3a390
add --served-model-name to vllm serve commands and wire up eval
ichbinblau May 15, 2026
b465745
fix model name consistency between vllm serve and bench client
ichbinblau May 15, 2026
7240dcf
Initial commit
simondanielsson May 15, 2026
41b2fc5
feat: add configs for minimax on mi300 and mi325
simondanielsson May 15, 2026
bfbaed7
add token patch to bench for vllm
ichbinblau May 15, 2026
cd374a1
add --tokenizer passthrough to run_benchmark_serving
ichbinblau May 15, 2026
4e138f4
update vllm image for kimi2.5 and Minimax disagg.
ichbinblau May 15, 2026
0a50b08
fix: add tokenizer path optionally
simondanielsson May 15, 2026
b366233
Merge remote-tracking branch 'upstream/amd/vllm_disagg_mvp_dev_th' in…
simondanielsson May 15, 2026
2989913
fix: remove minimax wideep patch which caused gibberish output
simondanielsson May 15, 2026
927064d
fix: remove patch causing gibberish output
simondanielsson May 15, 2026
73bd20f
fix: install BNXT userspace libs at runtime and remove unused patches
simondanielsson May 18, 2026
fef0c72
fix: use read mode for decode instances as well
simondanielsson May 18, 2026
f89d2b6
fix: pin non-down mi300 nodes
simondanielsson May 19, 2026
83f1609
feat: multi-node support for gfx942
simondanielsson May 19, 2026
cd3e243
fix: use full node name to mi300 exclude list
simondanielsson May 19, 2026
49112be
remove vllm disagg for dpsr1 and dpv3
ichbinblau Apr 13, 2026
78639d8
consolidate amd_utils for sglang and vllm
ichbinblau Apr 21, 2026
0fc3679
use vLLM router as default router for vllm disagg
ichbinblau Apr 21, 2026
9d6c39b
fix bugs
ichbinblau Apr 23, 2026
05d5952
[AMD] Bump to nightly vllm and vllm-router images (#1208)
simondanielsson May 4, 2026
b621e76
update vllm image and vllm router image
ichbinblau May 12, 2026
3f1ce6f
update the interface prefix for tw cluster
ichbinblau May 12, 2026
ee52aff
add deps for ib device auto-detection
ichbinblau May 13, 2026
4abca16
update vllm image
ichbinblau May 13, 2026
c9e0d0f
fix indentation and add missing finally block in async_request_openai…
ichbinblau May 13, 2026
135dab0
fix tw-eth interface detection pattern in env.sh
ichbinblau May 13, 2026
943d6b6
fix vllm-disagg config schema: use scenarios.fixed-seq-len
ichbinblau May 13, 2026
b8277b9
fix vllm-disagg routing to multi_node benchmark subdir
ichbinblau May 13, 2026
1336c34
fix result collection to use FRAMEWORK as log directory prefix
ichbinblau May 13, 2026
281f679
suppress tokenizer warnings and debug output in bench.sh
ichbinblau May 14, 2026
b131734
fix vllm-disagg deadlock: stop router after rank 0 container exits
ichbinblau May 14, 2026
53d84e8
reduce vllm-disagg concurrency sweep to single point for faster itera…
ichbinblau May 14, 2026
416dc14
preserve slurm logs on failure and print stderr inline
ichbinblau May 14, 2026
f2e9cdb
enable set -x around docker privilege detection for CI debugging
ichbinblau May 14, 2026
ee980da
fix docker detection: test on compute node, not batch host
ichbinblau May 14, 2026
da68e5f
fix docker detection: per-node probe since group membership varies
ichbinblau May 14, 2026
63717ad
add vllm-disagg changelog entries and update kimi conc-list
ichbinblau May 14, 2026
e8f8cad
switch vllm-disagg to 8k1k config to trigger multi-node eval
ichbinblau May 14, 2026
980ffd0
add multi-node eval feature
ichbinblau May 15, 2026
d3aa76c
remove start_etcd.sh
ichbinblau May 15, 2026
8d730c9
change decode to 1, easier for testing
ichbinblau May 15, 2026
ed80d6f
add --served-model-name to vllm serve commands and wire up eval
ichbinblau May 15, 2026
54ba6df
fix model name consistency between vllm serve and bench client
ichbinblau May 15, 2026
b0f116e
add token patch to bench for vllm
ichbinblau May 15, 2026
4e3d87c
add --tokenizer passthrough to run_benchmark_serving
ichbinblau May 15, 2026
a99c4f6
update vllm image for kimi2.5 and Minimax disagg.
ichbinblau May 15, 2026
8a24fa6
Update setup_deps.sh
ichbinblau May 18, 2026
1967919
Update amd-master.yaml
ichbinblau May 18, 2026
7987452
update req rate for vllm.
ichbinblau May 19, 2026
b9df2a0
make the sglang env consistent with upstream
ichbinblau May 19, 2026
7925efd
node blacklist
ichbinblau May 19, 2026
e3d7c16
Merge remote-tracking branch 'upstream/amd/vllm_disagg_mvp_dev_th' in…
simondanielsson May 19, 2026
abdbff6
fix: re-add MORI_IO_TC envvars
simondanielsson May 19, 2026
28ae46b
fix: add excluded nodes in MI325 cluster
simondanielsson May 20, 2026
80a5b37
fix: update conc list and use 2p1d for 8k/1k high conc
simondanielsson May 20, 2026
d383560
fix: set MORI-related envvars for vllm same as sgl
simondanielsson May 20, 2026
3d962a7
fix: update exluded node now when more are down
simondanielsson May 20, 2026
2b99dcb
fix: update excluded mi300 nodes
simondanielsson May 20, 2026
bbea3a7
fix: remove sudo from rm commands in mi325 runner
simondanielsson May 20, 2026
ae7dca4
fix: update mi325 model path
simondanielsson May 20, 2026
a7ae751
fix: use a more random port than 5000 for initial container creation …
simondanielsson May 20, 2026
97c34d1
fix: add backup docker command if docker and sudo docker does not work
simondanielsson May 21, 2026
95ac360
fix: docker backup fix
simondanielsson May 21, 2026
1c7e81c
fix: remove manual install of older libbnxt-re version
simondanielsson May 21, 2026
e6ab686
fix: preserve mi300x multinode launch diagnostics
haic0 Jun 28, 2026
650edf3
fix: use valid mi300x slurm excludes
haic0 Jun 28, 2026
aae9b41
fix: choose mi300x nodes with workspace access
haic0 Jun 29, 2026
e49e6b5
fix: stage mi300x multinode workspace
haic0 Jun 29, 2026
8e0bf53
fix: align mi300x multinode cleanup
haic0 Jun 29, 2026
39716d7
fix: wait for mi300x staging nodes
haic0 Jun 29, 2026
ce883b4
fix: probe all mi300x nodes for staging
haic0 Jun 29, 2026
dfabd35
fix: update mi300x vllm router image
haic0 Jun 29, 2026
7df4612
fix: reuse mi300x slurm allocation
haic0 Jun 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
260 changes: 260 additions & 0 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1312,6 +1312,266 @@ dsr1-fp8-mi355x-sglang-disagg-mtp:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=2"

minimaxm2.5-fp8-mi300x-vllm-disagg:
image: ghcr.io/simondanielsson/vllm/vllm-openai-rocm:fix-moriio-hangs-high-concurrency
model: MiniMaxAI/MiniMax-M2.5
model-prefix: minimaxm2.5
runner: mi300x-disagg
precision: fp8
framework: vllm-disagg
multinode: true
disagg: true
scenarios:
fixed-seq-len:
- isl: 1024
osl: 1024
search-space:
- spec-decoding: "none"
conc-list: [ 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"

- isl: 8192
osl: 1024
search-space:
# Top of curve: 2P1D
- spec-decoding: "none"
conc-list: [256, 512, 1024, 2048 ]
prefill:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=2"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"

# Bottom of curve: 1P2D
- spec-decoding: "none"
conc-list: [8, 16, 32, 64, 128]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"

minimaxm2.5-fp8-mi325x-vllm-disagg:
image: ghcr.io/simondanielsson/vllm/vllm-openai-rocm:fix-moriio-hangs-high-concurrency
model: MiniMaxAI/MiniMax-M2.5
model-prefix: minimaxm2.5
runner: mi325x-disagg
precision: fp8
framework: vllm-disagg
multinode: true
disagg: true
scenarios:
fixed-seq-len:
- isl: 1024
osl: 1024
search-space:
- spec-decoding: "none"
conc-list: [ 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"

- isl: 8192
osl: 1024
search-space:
# Top of curve: 2P1D
- spec-decoding: "none"
conc-list: [256, 512, 1024, 2048 ]
prefill:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=2"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"

# Bottom of curve: 1P2D
- spec-decoding: "none"
conc-list: [8, 16, 32, 64, 128]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"

kimik2.5-fp4-mi355x-vllm-disagg:
image: vllm/vllm-openai-rocm:nightly-bf610c2f56764e1b30bc6065f4ceace3d6e59036
model: amd/Kimi-K2.5-MXFP4
model-prefix: kimik2.5
runner: mi355x-disagg
precision: fp4
framework: vllm-disagg
multinode: true
disagg: true
scenarios:
fixed-seq-len:
- isl: 1024
osl: 1024
search-space:
# 1P2D: 1 prefill node (co-located with proxy) + 2 decode nodes = 3 nodes total
- spec-decoding: "none"
conc-list: [ 8, 16, 32, 64, 128, 256, 512 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"

- isl: 8192
osl: 1024
search-space:
- spec-decoding: "none"
conc-list: [ 8, 16, 32, 64, 128, 256, 512 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"

minimaxm2.5-fp8-mi355x-vllm-disagg:
image: vllm/vllm-openai-rocm:nightly-bf610c2f56764e1b30bc6065f4ceace3d6e59036
model: MiniMaxAI/MiniMax-M2.5
model-prefix: minimaxm2.5
runner: mi355x-disagg
precision: fp8
framework: vllm-disagg
multinode: true
disagg: true
scenarios:
fixed-seq-len:
- isl: 1024
osl: 1024
search-space:
# 1P2D: 1 prefill node (co-located with proxy) + 2 decode nodes = 3 nodes total
# Prefill also needs EP=8: MiniMax M2.5 expert intermediate_size=1536,
# TP8 shards to 192 which is not divisible by FP8 block_n=128.
- spec-decoding: "none"
conc-list: [ 8, 16, 32, 64, 128, 256, 512 ]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"

- isl: 8192
osl: 1024
search-space:
- spec-decoding: "none"
conc-list: [ 8, 16, 32, 64, 128, 256, 512 ]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
- "VLLM_MORIIO_CONNECTOR_READ_MODE=1"
decode:
num-worker: 2
tp: 8
ep: 8
dp-attn: false
additional-settings:
- "DECODE_NODES=2"

dsr1-fp4-mi355x-sglang-disagg:
image: lmsysorg/sglang-rocm:v0.5.10.post1-rocm720-mi35x-20260501
model: amd/DeepSeek-R1-0528-MXFP4-v2
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/benchmark-multinode-tmpl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,9 @@ jobs:
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: multinode_server_logs_${{ env.RESULT_FILENAME }}
path: multinode_server_logs.tar.gz
path: |
multinode_server_logs.tar.gz
benchmark_artifacts/
if-no-files-found: ignore

- name: Upload agentic aggregated result
Expand Down
14 changes: 14 additions & 0 deletions benchmarks/benchmark_lib.sh
Original file line number Diff line number Diff line change
Expand Up @@ -204,10 +204,12 @@ run_benchmark_serving() {
local result_filename=""
local result_dir=""
local workspace_dir=""
local tokenizer=""
local use_chat_template=false
local dsv4=false
local trust_remote_code=false
local server_pid=""
local tokenizer=""

while [[ $# -gt 0 ]]; do
case $1 in
Expand Down Expand Up @@ -268,6 +270,10 @@ run_benchmark_serving() {
use_chat_template=true
shift
;;
--tokenizer)
tokenizer="$2"
shift 2
;;
--trust-remote-code)
trust_remote_code=true
shift
Expand All @@ -276,6 +282,10 @@ run_benchmark_serving() {
server_pid="$2"
shift 2
;;
--tokenizer)
tokenizer="$2"
shift 2
;;
*)
echo "Unknown parameter: $1"
return 1
Expand Down Expand Up @@ -383,6 +393,10 @@ run_benchmark_serving() {
benchmark_cmd+=(--trust-remote-code)
fi

if [[ -n "$tokenizer" ]]; then
benchmark_cmd+=(--tokenizer "$tokenizer")
fi

# Run benchmark with optional server monitoring
set -x
if [[ -n "$server_pid" ]]; then
Expand Down
Loading