Skip to content

Algorithm - Add top-1 load balancing loss for MoE routing#67

Open
qghuxmu wants to merge 11 commits into
devfrom
qingguo/onehot_lbl
Open

Algorithm - Add top-1 load balancing loss for MoE routing#67
qghuxmu wants to merge 11 commits into
devfrom
qingguo/onehot_lbl

Conversation

@qghuxmu

@qghuxmu qghuxmu commented Jun 30, 2025

Copy link
Copy Markdown
Contributor

Adds a new load balancing loss optimized for top-1 MoE routing. This loss mitigates the issue that even with imbalanced routing frequencies, the value of existing load balance loss could remain low.

Comment thread megatron/training/training.py Outdated
@qghuxmu qghuxmu changed the title Algorithm - Add one-hot load balancing loss for MoE routing Algorithm - Add top-1 load balancing loss for MoE routing Jul 1, 2025
Comment thread megatron/training/training.py Outdated
Comment thread megatron/training/arguments.py Outdated
Comment thread megatron/training/arguments.py Outdated
Comment thread megatron/training/arguments.py Outdated
Comment thread megatron/core/transformer/transformer_config.py Outdated
Comment thread megatron/core/transformer/moe/router.py Outdated
Comment thread megatron/core/transformer/moe/moe_utils.py Outdated
Comment thread megatron/core/transformer/moe/moe_utils.py Outdated
@yzygitzh

yzygitzh commented Jul 1, 2025

Copy link
Copy Markdown
Contributor

Also please help add a test case in tests/unit_tests/transformer/moe/test_aux_loss.py @qghuxmu

yzygitzh
yzygitzh previously approved these changes Jul 1, 2025
@yzygitzh yzygitzh dismissed their stale review July 1, 2025 14:55

Please help fix UT and make all-reduce one-pass.

(see https://arxiv.org/abs/2501.11873 for details); "top1_loss" corresponds to the top-1 load balancing loss,
and "none" implies no load balancing. The default is "aux_loss"."""

moe_top1_loss_temperature: float = 1.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the assert to filter out invalid value

@yzygitzh yzygitzh mentioned this pull request Jul 4, 2025
9 tasks
@sjtu-yc sjtu-yc force-pushed the qingguo/onehot_lbl branch from 1c3e8c0 to f7028c3 Compare July 7, 2025 14:59
@sjtu-yc sjtu-yc requested a review from a team as a code owner July 7, 2025 14:59
@github-actions

github-actions Bot commented Sep 5, 2025

Copy link
Copy Markdown

Marking as stale. No activity in 60 days.

@github-actions github-actions Bot added the stale label Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants