Amd/vllm disagg minimax fp8 cdna3#1949
Draft
haic0 wants to merge 98 commits into
Draft
Conversation
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
--------- Signed-off-by: Simon Danielsson <pedaniel@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Signed-off-by: Shan Theresa <theresa.shan@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
…_chat_completions Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The inline collect_latest_results.py hardcoded "sglang" as the log directory prefix, causing "No logs directory found" for vllm-disagg runs where bench.sh creates directories named vllm-disagg_isl_X_osl_Y. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The vllm-router runs as a separate container on node 0. After node 0's main container finishes the benchmark and exits, decode nodes remain stuck waiting for the router port to close. The router cleanup in job.slurm can't run until srun completes, but srun can't complete because decode nodes are blocked — deadlock. Fix: skip exec on rank 0 for vllm-disagg so the srun bash script continues after docker exits and can stop the router container, allowing decode nodes to detect the port closure and exit. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…tion Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The EXIT trap deleted benchmark_logs/ before saving artifacts, making it impossible to debug container startup failures. Now the trap always copies slurm .out/.err to the artifact directory and prints the last 100 lines of .err inline in the CI output. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The batch host has docker socket permissions but the compute nodes do not, causing "permission denied" on all srun tasks. Move the detection after SELECTED_NODES is known and probe via srun. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Export DOCKER_CMD_DETECT as a shell snippet that each srun participant evaluates locally, instead of testing a single node and assuming all nodes have the same docker socket permissions. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
- Add perf-changelog entries for kimik2.5-fp4-mi355x-vllm-disagg and minimaxm2.5-fp8-mi355x-vllm-disagg to trigger CI benchmarks - Update kimi 1k1k conc-list from [8] to [16] - Comment out kimi 8k1k config until eval pipeline is wired up Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Comment out 1k1k config and enable 8k1k with conc-list [16] so mark_eval_entries picks it up for the eval pipeline. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Set --served-model-name on all prefill/decode vllm serve commands so the model name matches what run_lm_eval sends in API requests. Also add eval pipeline support (health check, run_eval, artifact staging) mirroring server_sglang.sh. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
bench.sh now uses MODEL_NAME for vllm-disagg to match --served-model-name, and MODEL_PATH for sglang to match its default. Simplified SERVED_MODEL to use MODEL_NAME directly since MODEL env var is not available inside the container. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
bench.sh now uses MODEL_NAME for vllm-disagg to match --served-model-name, and MODEL_PATH for sglang to match its default. Simplified SERVED_MODEL to use MODEL_NAME directly since MODEL env var is not available inside the container. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
benchmark_lib.sh rejected unknown flags — add --tokenizer support so vllm-disagg bench can resolve the tokenizer from the local model path instead of attempting an HF download with the short model name. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Signed-off-by: Shan Theresa <theresa.shan@amd.com>
restore the kimi k2.5 settings
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
Signed-off-by: Theresa Shan <theresa.shan@amd.com>
…to amd/vllm_disagg_minimax_fp8_cdna3 Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
…barrier Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: haic0 <haic0@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: haic0 <haic0@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: haic0 <haic0@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: haic0 <haic0@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: haic0 <haic0@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: haic0 <haic0@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: haic0 <haic0@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.