Modify ep_moe_fused 's backward to run with ep_size < 8 by WhatGhost · Pull Request #171 · ByteDance-Seed/Triton-distributed

WhatGhost · 2026-05-26T06:13:16Z

When I run the test_ep_moe_fused.py within 4 GPUS.

NVSHMEM_DISABLE_CUDA_VMM=0  bash ./scripts/launch.sh --nproc_per_node=4 python/triton_dist/test/nvidia/test_ep_moe_fused.py --ntokens 8192 --hidden_dim 1536 --ffn_dim 480 --topk 8 --num_experts 64

I met the error

[rank1]: Traceback (most recent call last):
[rank1]:   File "/target/Triton-distributed/python/triton_dist/test/nvidia/test_ep_moe_fused.py", line 380, in <module>
[rank1]:     main()
[rank1]:   File "/target/Triton-distributed/python/triton_dist/test/nvidia/test_ep_moe_fused.py", line 301, in main
[rank1]:     triton_dist_fwd_bwd_time, triton_dist_fwd_bwd_mem = benchmark_latency_memory(
[rank1]:                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/target/Triton-distributed/python/triton_dist/profiler_utils.py", line 376, in benchmark_latency_memory
[rank1]:     func()
[rank1]:   File "/target/Triton-distributed/python/triton_dist/test/nvidia/test_ep_moe_fused.py", line 292, in triton_dist_fwd_bwd
[rank1]:     output.backward(grad_output)
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/torch/_tensor.py", line 648, in backward
[rank1]:     torch.autograd.backward(
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/torch/autograd/__init__.py", line 353, in backward
[rank1]:     _engine_run_backward(
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/torch/autograd/graph.py", line 824, in _engine_run_backward
[rank1]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py", line 307, in apply
[rank1]:     return user_fn(self, *args)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/torch/amp/autocast_mode.py", line 556, in decorate_bwd
[rank1]:     return bwd(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/target/Triton-distributed/python/triton_dist/function/nvidia/ep_moe_fused.py", line 207, in backward
[rank1]:     assert triton_dist_ep_ctx.ep_group.size() == 8  # only for intra-node
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AssertionError

It seems like that the assert statement enforces the constraint that ep_group_size must be 8.

So I change it from "==8" to "<=8" .So it can run on 4GPUS.

Once the changes were made, I ran the tests again and It worked~.

‘’‘

==================================================================================================================================
Expert Parallel MoE Benchmark Summary (SM_margin=0, topk=8, num_experts=64) (format: latency(ms)/peak_memory(MB)/precision)
==================================================================================================================================
 Ntokens   Hidden      FFN triton_dist_fwd triton_dist_fwd_bwd
==============================================================
    1024     1536      480   1.979/24.45/✅      4.659/105.76/✅
    2048     1536      480   2.317/36.53/✅      4.450/130.06/✅
    4096     1536      480   2.386/62.08/✅      5.133/181.14/✅
    8192     1536      480  2.472/110.80/✅      5.488/278.16/✅

’‘’

CLAassistant · 2026-05-26T06:13:27Z

All committers have signed the CLA.

Change assert to run ep < 8

95aef01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify ep_moe_fused 's backward to run with ep_size < 8#171

Modify ep_moe_fused 's backward to run with ep_size < 8#171
WhatGhost wants to merge 1 commit into
ByteDance-Seed:mainfrom
WhatGhost:dev-test1

WhatGhost commented May 26, 2026

Uh oh!

CLAassistant commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WhatGhost commented May 26, 2026

Uh oh!

CLAassistant commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented May 26, 2026 •

edited

Loading