Skip to content

Update upstream#82

Open
AleHD wants to merge 916 commits into
mainfrom
fp8
Open

Update upstream#82
AleHD wants to merge 916 commits into
mainfrom
fp8

Conversation

@AleHD

@AleHD AleHD commented Jun 25, 2025

Copy link
Copy Markdown
Collaborator

This PR includes the following additions:

  • Updated with upstream.
  • Metrics tracking feature (kurtosis) for ease of fp8 development.
  • Xielu backwards compatibility fix.

mathemakitten and others added 30 commits April 25, 2025 20:40
Fix extra tokens in returned generation

Closes dl/JoC/nemo-ci#2075

See merge request ADLR/megatron-lm!3178
Update current scaling supported TE version to 2.2.0.dev0

See merge request ADLR/megatron-lm!3160
Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-login-01.cm.cluster>
Co-authored-by: Vijay Korthikanti <vkorthikanti@cw-dfw-cs-001-login-01.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-279-012.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-316-012.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-258-026.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-008-033.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-236-026.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-267-012.cm.cluster>
Seperate chunk allocator

See merge request ADLR/megatron-lm!3121
Revert inference_context.is_decode_only() to inference_context.sequence_len_offset > 0

See merge request ADLR/megatron-lm!3180
…-fusion will throw an exception when topk/num_local_experts is not the power of 2.
[BUG FIX]: fix the bug of indices-to-multihot-fusion will throw an exception when topk/num_local_experts is not the power of 2.

See merge request ADLR/megatron-lm!3058
…g global ones with optional local ones for better parallelism flexibility

Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Refactor Inference Process Groups by replacing global ones with optional local ones for better parallelism flexibility

See merge request ADLR/megatron-lm!3015
Update te patch to include 1626

See merge request ADLR/megatron-lm!3179
Co-authored-by: root <root@cw-dfw-h100-004-211-013.cm.cluster>
Co-authored-by: Vijay Korthikanti <vkorthikanti@cw-dfw-cs-001-login-01.cm.cluster>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: root <root@cw-dfw-h100-004-279-012.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-316-012.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-258-026.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-008-033.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-236-026.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-267-012.cm.cluster>
Use FlashAttention 3 for inference

See merge request ADLR/megatron-lm!3120
No RoPE for Llama4

See merge request ADLR/megatron-lm!3167
Enable --fp8-param-gather for NV sub-channel recipe

See merge request ADLR/megatron-lm!3010
…swiglu perf

Co-authored-by: lit <lit@nvidia.com>
fix: fix FP8 support in recompute; fix fused swiglu perf

See merge request ADLR/megatron-lm!3133
ci: Auto-apply most recent milestone

See merge request ADLR/megatron-lm!3189
fix: Correct date of review stage

See merge request ADLR/megatron-lm!3191
tests: Fix model-config test for nemo2

See merge request ADLR/megatron-lm!3130
Co-authored-by: Oliver Koenig <okoenig@login-eos01.eos.clusters.nvidia.com>
shjwudp and others added 27 commits June 2, 2025 18:03
Co-authored-by: jianbinc <shjwudp@gmail.com>
Fix custom FSDP float8 tensor set_item

See merge request ADLR/megatron-lm!3280
ci: Move queue blocker

See merge request ADLR/megatron-lm!3401
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
ci: Improve error-handling of missing logs

See merge request ADLR/megatron-lm!3400
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
ci: Control job concurrency

See merge request ADLR/megatron-lm!3408
ci: Catch missing logs

See merge request ADLR/megatron-lm!3412
ci: Remove tests from A100

See merge request ADLR/megatron-lm!3411
Add an option to skip counting zeros in grad of ChainedOptimizer

See merge request ADLR/megatron-lm!3393
Add an interface to set high priority stream groups

See merge request ADLR/megatron-lm!3326
@AleHD

AleHD commented Jun 25, 2025

Copy link
Copy Markdown
Collaborator Author

[WIP]

@AleHD AleHD mentioned this pull request Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.