Skip to content

Initial ROCm support for vime#273

Open
indianspeedster wants to merge 7 commits into
vllm-project:mainfrom
indianspeedster:rocm-7.0.2
Open

Initial ROCm support for vime#273
indianspeedster wants to merge 7 commits into
vllm-project:mainfrom
indianspeedster:rocm-7.0.2

Conversation

@indianspeedster

@indianspeedster indianspeedster commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Initial AMD ROCm support for vime (ROCm 7.0.2, gfx950 / MI350/MI355X).

  • docker/Dockerfile.rocm — full source build for ROCm 7.0.2
  • docker/patch/megatron_nofork_patch.py — in-process checkpoint writer (works around a ROCm 7.0.2 fork segfault)
  • scripts/run-qwen3-8B-rocm.sh, scripts/run-qwen3-8B-async-rocm.sh — colocate / async run scripts
  • docs/en/get_started/quick_start_rocm.md — ROCm quick start guide (from Quick Start Guide for ROCm Support #293)
  • keep HIP visibility in sync with CUDA in the vLLM subprocess env

Build:

DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm --build-arg GPU_ARCH=gfx950 -t vime-rocm702 .

Main author: @pancake0003

@read-the-docs-community

read-the-docs-community Bot commented Jun 19, 2026

Copy link
Copy Markdown

@indianspeedster indianspeedster changed the title Add ROCm 7.0.2 build (Dockerfile.rocm) Initial ROCm support for vime Jun 19, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for AMD ROCm 7.0.2 (gfx950) by adding a dedicated Dockerfile, a Megatron no-fork patch to prevent segfaults, and run scripts for Qwen3-8B. Feedback highlights several issues: in the Dockerfile, Python dev packages might be skipped if the version matches the system default, the deprecated apt-key should be replaced with the modern signed-by keyring, and the find command for utils.py needs error handling. Additionally, the Megatron patch should locally import logging and time to prevent runtime NameErrors, and the shell scripts should avoid broad, disruptive pkill commands.

Comment thread docker/Dockerfile.rocm Outdated
Comment thread docker/patch/megatron_nofork_patch.py
Comment thread docker/Dockerfile.rocm
Comment thread docker/Dockerfile.rocm
Comment thread scripts/run-qwen3-8B-rocm.sh Outdated
Comment thread scripts/run-qwen3-8B-async-rocm.sh Outdated
@aoshen02 aoshen02 mentioned this pull request Jun 21, 2026
15 tasks
@indianspeedster indianspeedster marked this pull request as ready for review June 22, 2026 21:29
@lizamd lizamd mentioned this pull request Jun 22, 2026
3 tasks
@indianspeedster indianspeedster force-pushed the rocm-7.0.2 branch 3 times, most recently from f6d1d63 to ed16bb9 Compare June 23, 2026 03:31
pancake0003 and others added 3 commits June 23, 2026 11:51
Comment thread .dockerignore Outdated

# Local debug scratch (repro scripts, mem traces, sampled logs, plots)
_*.py
_*.sh

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to merge dockerignore directly to the main branch

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I removed it.

… .dockerignore

- Resolve conflict in _build_subprocess_env: keep main's refactored
  server_args_dict["_visible_devices"] API and apply the ROCm HIP
  visibility line on top of it.
- Remove .dockerignore per review (@aoshen02): not needed in main.
# actor and rollout on disjoint GPUs, RCCL-broadcast weight sync (no colocate IPC).

# Clean leftovers from a previous run (vLLM orphans procs named VLLM::*).
ray stop --force

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format can align more with other scrpt, for example scripts/run-qwen3-8B-async-rocm.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants