[model] Support deepseek-v4 by Jintao-Huang · Pull Request #86 · modelscope/mcore-bridge

Jintao-Huang · 2026-05-20T05:48:01Z

No description provided.

gemini-code-assist

Code Review

This pull request adds documentation for deepseek_v4 support and updates the transformer layer to integrate with newer Megatron-Core offloading interfaces. Feedback indicates that the offloading implementation is incomplete, as it fails to properly initialize the normalization managers, and the local import should be moved for efficiency. Additionally, the implementation for the deepseek_v4 model appears to be missing from the current changes.

gemini-code-assist · 2026-05-20T05:50:51Z

+        if hasattr(self, '_set_offload_modules'):
+            from megatron.core.transformer.transformer_layer import _get_offloading_interface
+            self._set_offload_modules()
+            self.off_interface = _get_offloading_interface()
+            self.mlp_norm_manager = None


The initialization of offloading managers for Megatron-Core 0.17+ is incomplete. Setting self.mlp_norm_manager = None without assigning a manager from self.off_interface effectively disables offloading for the MLP layer normalization, even when it is configured in offload_modules. Additionally, self.attn_norm_manager should also be initialized to avoid potential AttributeError in base class methods that expect it to be present in newer versions of Megatron-Core.

Also, the local import of _get_offloading_interface inside __init__ is inefficient as it executes for every layer instantiation; consider moving it to the top of the file if possible.

if hasattr(self, '_set_offload_modules'): from megatron.core.transformer.transformer_layer import _get_offloading_interface self._set_offload_modules() self.off_interface = _get_offloading_interface() offload_modules = getattr(self.config, 'offload_modules', []) or [] is_offloading = getattr(self.config, 'fine_grained_activation_offloading', False) self.attn_norm_manager = self.off_interface.get_manager('attn_norm') if is_offloading and 'attn_norm' in offload_modules else None self.mlp_norm_manager = self.off_interface.get_manager('mlp_norm') if is_offloading and 'mlp_norm' in offload_modules else None

gemini-code-assist · 2026-05-20T05:50:51Z

 | -------- | ------------------------------------------------------------ |
 | Qwen     | qwen2, qwen2_moe<br />qwen3, qwen3_moe, qwen3_next |
-| DeepSeek | deepseek_v3, deepseek_v32                                    |
+| DeepSeek | deepseek_v3, deepseek_v32, deepseek_v4                        |


The PR adds deepseek_v4 to the list of supported models, but the actual implementation appears to be missing. The file src/mcore_bridge/model/gpts/deepseek_v4.py is empty in the provided context, and there are no changes to model registration or configuration logic to support this new model type. Please ensure the implementation is included or clarify if it relies on an existing model type.

Jintao-Huang added 2 commits May 20, 2026 11:47

support deepseek_v4

8403e3c

update

358e51f

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

update

9ea9b44

Jintao-Huang mentioned this pull request May 20, 2026

[megatron] Support deepseek-v4 megatron modelscope/ms-swift#9386

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] Support deepseek-v4 #86

[model] Support deepseek-v4 #86
Jintao-Huang wants to merge 3 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_

Jintao-Huang commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jintao-Huang commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant