[model] Support deepseek-v4 #86
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds documentation for deepseek_v4 support and updates the transformer layer to integrate with newer Megatron-Core offloading interfaces. Feedback indicates that the offloading implementation is incomplete, as it fails to properly initialize the normalization managers, and the local import should be moved for efficiency. Additionally, the implementation for the deepseek_v4 model appears to be missing from the current changes.
| if hasattr(self, '_set_offload_modules'): | ||
| from megatron.core.transformer.transformer_layer import _get_offloading_interface | ||
| self._set_offload_modules() | ||
| self.off_interface = _get_offloading_interface() | ||
| self.mlp_norm_manager = None |
There was a problem hiding this comment.
The initialization of offloading managers for Megatron-Core 0.17+ is incomplete. Setting self.mlp_norm_manager = None without assigning a manager from self.off_interface effectively disables offloading for the MLP layer normalization, even when it is configured in offload_modules. Additionally, self.attn_norm_manager should also be initialized to avoid potential AttributeError in base class methods that expect it to be present in newer versions of Megatron-Core.
Also, the local import of _get_offloading_interface inside __init__ is inefficient as it executes for every layer instantiation; consider moving it to the top of the file if possible.
if hasattr(self, '_set_offload_modules'):
from megatron.core.transformer.transformer_layer import _get_offloading_interface
self._set_offload_modules()
self.off_interface = _get_offloading_interface()
offload_modules = getattr(self.config, 'offload_modules', []) or []
is_offloading = getattr(self.config, 'fine_grained_activation_offloading', False)
self.attn_norm_manager = self.off_interface.get_manager('attn_norm') if is_offloading and 'attn_norm' in offload_modules else None
self.mlp_norm_manager = self.off_interface.get_manager('mlp_norm') if is_offloading and 'mlp_norm' in offload_modules else None| | -------- | ------------------------------------------------------------ | | ||
| | Qwen | qwen2, qwen2_moe<br />qwen3, qwen3_moe, qwen3_next | | ||
| | DeepSeek | deepseek_v3, deepseek_v32 | | ||
| | DeepSeek | deepseek_v3, deepseek_v32, deepseek_v4 | |
There was a problem hiding this comment.
The PR adds deepseek_v4 to the list of supported models, but the actual implementation appears to be missing. The file src/mcore_bridge/model/gpts/deepseek_v4.py is empty in the provided context, and there are no changes to model registration or configuration logic to support this new model type. Please ensure the implementation is included or clarify if it relies on an existing model type.
No description provided.