Update upstream by AleHD · Pull Request #82 · swiss-ai/Megatron-LM

AleHD · 2025-06-25T12:08:15Z

This PR includes the following additions:

Updated with upstream.
Metrics tracking feature (kurtosis) for ease of fp8 development.
Xielu backwards compatibility fix.

Fix extra tokens in returned generation Closes dl/JoC/nemo-ci#2075 See merge request ADLR/megatron-lm!3178

…o 2.2.0.dev0

Update current scaling supported TE version to 2.2.0.dev0 See merge request ADLR/megatron-lm!3160

Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Vijay Korthikanti <vkorthikanti@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-279-012.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-316-012.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-258-026.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-008-033.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-236-026.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-267-012.cm.cluster>

Seperate chunk allocator See merge request ADLR/megatron-lm!3121

…inference_context.sequence_len_offset > 0

Revert inference_context.is_decode_only() to inference_context.sequence_len_offset > 0 See merge request ADLR/megatron-lm!3180

…-fusion will throw an exception when topk/num_local_experts is not the power of 2.

[BUG FIX]: fix the bug of indices-to-multihot-fusion will throw an exception when topk/num_local_experts is not the power of 2. See merge request ADLR/megatron-lm!3058

…g global ones with optional local ones for better parallelism flexibility Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>

Refactor Inference Process Groups by replacing global ones with optional local ones for better parallelism flexibility See merge request ADLR/megatron-lm!3015

Update te patch to include 1626 See merge request ADLR/megatron-lm!3179

…yTorch FSDP2"" This reverts commit 1eaed21.

Co-authored-by: root <root@cw-dfw-h100-004-211-013.cm.cluster> Co-authored-by: Vijay Korthikanti <vkorthikanti@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com> Co-authored-by: root <root@cw-dfw-h100-004-279-012.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-316-012.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-258-026.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-008-033.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-236-026.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-267-012.cm.cluster>

Use FlashAttention 3 for inference See merge request ADLR/megatron-lm!3120

No RoPE for Llama4 See merge request ADLR/megatron-lm!3167

…recipe

Enable --fp8-param-gather for NV sub-channel recipe See merge request ADLR/megatron-lm!3010

…swiglu perf Co-authored-by: lit <lit@nvidia.com>

fix: fix FP8 support in recompute; fix fused swiglu perf See merge request ADLR/megatron-lm!3133

ci: Auto-apply most recent milestone See merge request ADLR/megatron-lm!3189

fix: Correct date of review stage See merge request ADLR/megatron-lm!3191

tests: Fix model-config test for nemo2 See merge request ADLR/megatron-lm!3130

Co-authored-by: Oliver Koenig <okoenig@login-eos01.eos.clusters.nvidia.com>

Co-authored-by: jianbinc <shjwudp@gmail.com>

Fix custom FSDP float8 tensor set_item See merge request ADLR/megatron-lm!3280

ci: Move queue blocker See merge request ADLR/megatron-lm!3401

Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

ci: Improve error-handling of missing logs See merge request ADLR/megatron-lm!3400

Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

ci: Control job concurrency See merge request ADLR/megatron-lm!3408

ci: Catch missing logs See merge request ADLR/megatron-lm!3412

ci: Remove tests from A100 See merge request ADLR/megatron-lm!3411

…of ChainedOptimizer

Add an option to skip counting zeros in grad of ChainedOptimizer See merge request ADLR/megatron-lm!3393

…groups

Add an interface to set high priority stream groups See merge request ADLR/megatron-lm!3326

AleHD · 2025-06-25T12:10:58Z

[WIP]

mathemakitten and others added 30 commits April 25, 2025 20:40

ADLR/megatron-lm!3178 - Fix extra tokens in returned generation

a1843ac

Merge branch 'helenn-fix-seqlen-chopping' into 'main'

ceed1b7

Fix extra tokens in returned generation Closes dl/JoC/nemo-ci#2075 See merge request ADLR/megatron-lm!3178

ADLR/megatron-lm!3160 - Update current scaling supported TE version t…

b764f2d

…o 2.2.0.dev0

Merge branch 'donghyukc/te_min_version' into 'main'

57d21c3

Update current scaling supported TE version to 2.2.0.dev0 See merge request ADLR/megatron-lm!3160

Merge branch 'seperate_chunk_allocator' into 'main'

e733d7d

Seperate chunk allocator See merge request ADLR/megatron-lm!3121

ADLR/megatron-lm!3180 - Revert inference_context.is_decode_only() to …

4f16de3

…inference_context.sequence_len_offset > 0

Merge branch 'helenn-fix-seqlenoffset' into 'main'

33a193d

Revert inference_context.is_decode_only() to inference_context.sequence_len_offset > 0 See merge request ADLR/megatron-lm!3180

ADLR/megatron-lm!3058 - [BUG FIX]: fix the bug of indices-to-multihot…

bc70535

…-fusion will throw an exception when topk/num_local_experts is not the power of 2.

Merge branch 'incidices_to_multihot' into 'main'

885a245

[BUG FIX]: fix the bug of indices-to-multihot-fusion will throw an exception when topk/num_local_experts is not the power of 2. See merge request ADLR/megatron-lm!3058

ADLR/megatron-lm!3015 - Refactor Inference Process Groups by replacin…

8208937

…g global ones with optional local ones for better parallelism flexibility Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>

Merge branch 'zhiyul/orthotope/inference' into 'main'

7118d88

Refactor Inference Process Groups by replacing global ones with optional local ones for better parallelism flexibility See merge request ADLR/megatron-lm!3015

ADLR/megatron-lm!3179 - Update te patch to include 1626

9bb34bf

Merge branch 'donghyukc/te_patch_update' into 'main'

2f4463e

Update te patch to include 1626 See merge request ADLR/megatron-lm!3179

Revert "Revert "ADLR/megatron-lm!2581 - Add support for ZeRO-2 with P…

4429e8e

…yTorch FSDP2"" This reverts commit 1eaed21.

Merge branch 'fa3_inference' into 'main'

47e3bd3

Use FlashAttention 3 for inference See merge request ADLR/megatron-lm!3120

ADLR/megatron-lm!3167 - No RoPE for Llama4

8d1367f

Merge branch 'aot/no_rope_llama4' into 'main'

5807d1c

No RoPE for Llama4 See merge request ADLR/megatron-lm!3167

ADLR/megatron-lm!3010 - Enable --fp8-param-gather for NV sub-channel …

72afd63

…recipe

Merge branch 'nv_subchannel_native_fp8' into 'main'

1eb5fe5

Enable --fp8-param-gather for NV sub-channel recipe See merge request ADLR/megatron-lm!3010

ADLR/megatron-lm!3133 - fix: fix FP8 support in recompute; fix fused …

cf6d208

…swiglu perf Co-authored-by: lit <lit@nvidia.com>

Merge branch 'hongxiaob/recompute_fp8_fix' into 'main'

f6b042b

fix: fix FP8 support in recompute; fix fused swiglu perf See merge request ADLR/megatron-lm!3133

ADLR/megatron-lm!3189 - ci: Auto-apply most recent milestone

f2c0f12

Merge branch 'ko3n1g/ci/auto-milestone' into 'main'

97d27d1

ci: Auto-apply most recent milestone See merge request ADLR/megatron-lm!3189

ADLR/megatron-lm!3191 - fix: Correct date of review stage

06a2dd5

Merge branch 'ko3n1g/fix/auto-reminder' into 'main'

9c5c870

fix: Correct date of review stage See merge request ADLR/megatron-lm!3191

ADLR/megatron-lm!3130 - tests: Fix model-config test for nemo2

97efad4

Merge branch 'ko3n1g/tests/fix-model-config-test' into 'main'

0a524aa

tests: Fix model-config test for nemo2 See merge request ADLR/megatron-lm!3130

ADLR/megatron-lm!3038 - chore: QA on 0.12 release

4b63750

Co-authored-by: Oliver Koenig <okoenig@login-eos01.eos.clusters.nvidia.com>

shjwudp and others added 27 commits June 2, 2025 18:03

ADLR/megatron-lm!3280 - Fix custom FSDP float8 tensor set_item

da3f0ff

Co-authored-by: jianbinc <shjwudp@gmail.com>

Merge branch 'fix_cfsdp_fp8_param_load' into 'main'

549d637

Fix custom FSDP float8 tensor set_item See merge request ADLR/megatron-lm!3280

fix circular imports (due to xielu) + add missing import in argument.py

7461ef2

tiny merge fix

1650f79

ADLR/megatron-lm!3401 - ci: Move queue blocker

24c60db

Merge branch 'ko3n1g/ci/move-queue-blocker' into 'main'

cfea2ea

ci: Move queue blocker See merge request ADLR/megatron-lm!3401

ADLR/megatron-lm!3400 - ci: Improve error-handling of missing logs

37b0afd

Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

Merge branch 'ko3n1g/ci/better-log-failure-handling' into 'main'

6a62a54

ci: Improve error-handling of missing logs See merge request ADLR/megatron-lm!3400

ADLR/megatron-lm!3408 - ci: Control job concurrency

4648912

Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

Merge branch 'ko3n1g/ci/job-concurrency' into 'main'

cde60ce

ci: Control job concurrency See merge request ADLR/megatron-lm!3408

ADLR/megatron-lm!3412 - ci: Catch missing logs

eab047c

Merge branch 'ko3n1g/ci/fix-no-log' into 'main'

25a26ca

ci: Catch missing logs See merge request ADLR/megatron-lm!3412

ADLR/megatron-lm!3411 - ci: Remove tests from A100

9bdfe31

Merge branch 'ko3n1g/ci/move-tests' into 'main'

ff64f96

ci: Remove tests from A100 See merge request ADLR/megatron-lm!3411

ADLR/megatron-lm!3393 - Add an option to skip counting zeros in grad …

d960800

…of ChainedOptimizer

Merge branch 'no_count_zeros' into 'main'

b47a9bb

Add an option to skip counting zeros in grad of ChainedOptimizer See merge request ADLR/megatron-lm!3393

ADLR/megatron-lm!3326 - Add an interface to set high priority stream …

bc80491

…groups

Merge branch 'comm-priority-setting' into 'main'

957f348

Add an interface to set high priority stream groups See merge request ADLR/megatron-lm!3326

fix missing import due to merge

940ccb5

merging latest commits from nvidia/main

10e9c5f

rerun fix

e34d1ee

Merge branch 'sai-into-nvidia' into fp8

e0314a2

ensure individual param logging not available with pp>1

c20763a

ensure individual param logging not available with pp>1

ee44773

ensure metrics tracking not available with pp>1

7899cc3

Fixed virtual parallel vp_stage issue

704dfcb

Fixed metrics tracking with pp

068e852

AleHD mentioned this pull request Jun 25, 2025

Partial upstream update #69

Closed

Fixed is_rank0

f134a3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update upstream#82

Update upstream#82
AleHD wants to merge 916 commits into
mainfrom
fp8

AleHD commented Jun 25, 2025

Uh oh!

AleHD commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

AleHD commented Jun 25, 2025

Uh oh!

AleHD commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants