Skip to content

compat megatron dev#87

Open
Jintao-Huang wants to merge 2 commits into
modelscope:mainfrom
Jintao-Huang:compat_megatron_dev
Open

compat megatron dev#87
Jintao-Huang wants to merge 2 commits into
modelscope:mainfrom
Jintao-Huang:compat_megatron_dev

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the configuration parser to keep num_query_groups, modifies _apply_rotary_pos_emb_bshd in GPTModel to support dynamic multi_latent_attention configuration, and adds logic to TransformerLayer for initializing offloading modules. A high-severity issue was identified in the GPTModel changes: the monkey-patched function captures the state of the first model instance, which will cause configuration conflicts in environments running multiple models.

Comment on lines +172 to +173
if multi_latent_attention is None:
multi_latent_attention = self.config.multi_latent_attention
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The monkey-patched _apply_rotary_pos_emb_bshd function captures the self instance of the first GPTModel that triggers the patch. Since the patch is only applied once (due to the check at line 144), all subsequent GPTModel instances will use the multi_latent_attention configuration from the first instance, regardless of their own configuration. This will cause incorrect behavior in multi-model environments (e.g., different models in the same process or complex pipeline parallel setups).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant