Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ You can contact us and communicate with us by adding our group:

## 📝 Introduction

**mcore-bridge** is a large language model and multimodal large model definition library built on the Megatron-Core ecosystem, developed by the ModelScope community. It currently supports 300+ text-only models and 200+ multimodal models, including large language models such as Qwen3-Next, GLM5.1, DeepSeek-V3.2, Minimax2.7, Kimi K2.5, and GPT-OSS, as well as multimodal large models such as Qwen3.5, Qwen3-Omni, Gemma4, GLM4.6-V, InternVL3.5, and Ovis2.5.
**mcore-bridge** is a large language model and multimodal large model definition library built on the Megatron-Core ecosystem, developed by the ModelScope community. It currently supports 300+ text-only models and 200+ multimodal models, including large language models such as Qwen3-Next, GLM-5.1, DeepSeek-V3.2, Minimax-2.7, Kimi-K2.5, and GPT-OSS, as well as multimodal large models such as Qwen3.5, Qwen3-Omni, Gemma4, GLM4.6-V, InternVL3.5, and Ovis2.5.

------

Expand Down
2 changes: 1 addition & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@

## 📝 简介

**mcore-bridge** 是由魔搭社区推出的、基于 Megatron-Core 生态构建的大模型与多模态大模型定义库。目前已支持 300+ 纯文本模型与 200+ 多模态模型。其中大语言模型包括 Qwen3-Next、GLM5.1、DeepSeek-V3.2、Minimax2.7、Kimi K2.5、GPT-OSS 等;多模态大模型包括 Qwen3.5、Qwen3-Omni、Gemma4、GLM4.6-V、InternVL3.5、Ovis2.5 等。
**mcore-bridge** 是由魔搭社区推出的、基于 Megatron-Core 生态构建的大模型与多模态大模型定义库。目前已支持 300+ 纯文本模型与 200+ 多模态模型。其中大语言模型包括 Qwen3-Next、GLM-5.1、DeepSeek-V3.2、Minimax-2.7、Kimi-K2.5、GPT-OSS 等;多模态大模型包括 Qwen3.5、Qwen3-Omni、Gemma4、GLM4.6-V、InternVL3.5、Ovis2.5 等。

------

Expand Down
2 changes: 2 additions & 0 deletions src/mcore_bridge/config/model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,8 @@ def __post_init__(self):
self.mtp_num_layers = 1
else:
self.mtp_unroll_steps = self.mtp_num_layers
if self.multi_latent_attention:
self.rotary_interleaved = False
super().__post_init__()

self._check_npu()
Expand Down
5 changes: 5 additions & 0 deletions src/mcore_bridge/config/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,11 @@ def hf_to_mcore_config(hf_config: PretrainedConfig) -> Dict[str, Any]:
res['moe_layer_freq'] = f"[{','.join(moe_layer_freq)}]"
elif hf_model_type == 'glm4v':
res['rotary_interleaved'] = True
elif llm_model_type == 'bailing_hybrid':
res['qk_layernorm'] = True
res['add_qkv_bias'] = False
res['moe_router_score_function'] = 'sigmoid'
res['moe_router_load_balancing_type'] = 'seq_aux_loss'

if 'partial_rotary_factor' not in res and 'partial_rotary_factor' in rope_scaling:
res['partial_rotary_factor'] = rope_scaling['partial_rotary_factor']
Expand Down
1 change: 1 addition & 0 deletions src/mcore_bridge/model/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ class LLMModelType:
minimax_m2 = 'minimax_m2'
hy_v3 = 'hy_v3'
bailing_moe = 'bailing_moe'
bailing_hybrid = 'bailing_hybrid'

qwen3_emb = 'qwen3_emb'

Expand Down
2 changes: 1 addition & 1 deletion src/mcore_bridge/model/gpts/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Copyright (c) ModelScope Contributors. All rights reserved.
from . import bailing_moe, glm4, hunyuan, llm, minimax_m2, olmoe, qwen3_emb, qwen3_next
from . import bailing_hybrid, bailing_moe, glm4, hunyuan, llm, minimax_m2, olmoe, qwen3_emb, qwen3_next
68 changes: 68 additions & 0 deletions src/mcore_bridge/model/gpts/bailing_hybrid.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Copyright (c) ModelScope Contributors. All rights reserved.
from megatron.core.inference.contexts import BaseInferenceContext
from megatron.core.packed_seq_params import PackedSeqParams
from torch import Tensor

from mcore_bridge.bridge import GPTBridge
from ..constant import ModelType
from ..register import ModelLoader, ModelMeta, register_model
from typing import Optional, Union, Tuple
from megatron.core.transformer.attention import SelfAttention
from megatron.core.transformer.attention import SelfAttentionSubmodules
from megatron.core.transformer.transformer_config import TransformerConfig


class BailingHybridBridge(GPTBridge):
pass


class LinearAttention(SelfAttention):
def __init__(
self,
config: TransformerConfig,
*args, **kwargs,
):
super().__init__(config, *args, **kwargs)

def forward(
self,
hidden_states: Tensor,
attention_mask: Tensor,
**kwargs,
) -> Tuple[Tensor, Tensor]:
return super().forward(hidden_states, attention_mask, **kwargs)
Comment on lines +19 to +33
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The __init__ and forward methods in the LinearAttention class are redundant as they just call the superclass methods with the same arguments. You can remove them for cleaner and more concise code.

After this change, TransformerConfig and Tuple will become unused imports and should also be removed, along with other unused imports in this file (BaseInferenceContext, PackedSeqParams, Union, and SelfAttentionSubmodules).

class LinearAttention(SelfAttention):
    pass



class BailingHybridLoader(ModelLoader):

def get_transformer_layer_spec(self, vp_stage: Optional[int] = None):
hf_config = self.config.hf_config
num_layers = hf_config.num_hidden_layers
group_size = hf_config.layer_group_size
tail_start = num_layers // group_size * group_size
hf_config.attention_layer_type = [
"attention"
if (layer_idx + 1) % group_size == 0 or layer_idx >= tail_start
else "linear_attention"
for layer_idx in range(num_layers)
]
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
multi_latent_attention = self.config.multi_latent_attention
self.config.multi_latent_attention = False
linear_layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
self.config.multi_latent_attention = multi_latent_attention
Comment on lines +49 to +53
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation for getting linear_layer_specs by temporarily modifying self.config.multi_latent_attention has a couple of issues:

  1. Safety: If super().get_transformer_layer_spec() raises an exception, self.config.multi_latent_attention will not be restored to its original value. This could lead to unexpected behavior in subsequent operations. Using a try...finally block is recommended for safety.
  2. Efficiency: super().get_transformer_layer_spec() is called twice. If self.config.multi_latent_attention is False to begin with, both calls are identical, which is redundant and inefficient.

Consider refactoring this logic to be safer and more efficient.

Suggested change
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
multi_latent_attention = self.config.multi_latent_attention
self.config.multi_latent_attention = False
linear_layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
self.config.multi_latent_attention = multi_latent_attention
multi_latent_attention = self.config.multi_latent_attention
if multi_latent_attention:
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
try:
self.config.multi_latent_attention = False
linear_layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
finally:
self.config.multi_latent_attention = multi_latent_attention
else:
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
linear_layer_specs = layer_specs

for i, layer_spec in enumerate(layer_specs.layer_specs):
if hf_config.attention_layer_type[i] == 'linear_attention':
linear_spec = linear_layer_specs.layer_specs[i].submodules.self_attention
linear_spec.module = LinearAttention
layer_spec.submodules.self_attention = linear_spec
return layer_specs


register_model(
ModelMeta(
ModelType.bailing_hybrid,
['bailing_hybrid'],
bridge_cls=BailingHybridBridge,
loader=BailingHybridLoader,
))
Loading