Skip to content

[model] Support bailing v2 5#85

Open
Jintao-Huang wants to merge 3 commits into
modelscope:mainfrom
Jintao-Huang:support_bailing_v2_5
Open

[model] Support bailing v2 5#85
Jintao-Huang wants to merge 3 commits into
modelscope:mainfrom
Jintao-Huang:support_bailing_v2_5

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the bailing_hybrid model, including its configuration mapping and a specialized loader that handles hybrid attention layers. Review feedback highlights the need for safer and more efficient logic when retrieving transformer layer specifications, specifically recommending a try...finally block to ensure configuration state is restored. Additionally, it was suggested to remove redundant method overrides in the LinearAttention class and clean up several unused imports.

Comment on lines +49 to +53
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
multi_latent_attention = self.config.multi_latent_attention
self.config.multi_latent_attention = False
linear_layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
self.config.multi_latent_attention = multi_latent_attention
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation for getting linear_layer_specs by temporarily modifying self.config.multi_latent_attention has a couple of issues:

  1. Safety: If super().get_transformer_layer_spec() raises an exception, self.config.multi_latent_attention will not be restored to its original value. This could lead to unexpected behavior in subsequent operations. Using a try...finally block is recommended for safety.
  2. Efficiency: super().get_transformer_layer_spec() is called twice. If self.config.multi_latent_attention is False to begin with, both calls are identical, which is redundant and inefficient.

Consider refactoring this logic to be safer and more efficient.

Suggested change
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
multi_latent_attention = self.config.multi_latent_attention
self.config.multi_latent_attention = False
linear_layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
self.config.multi_latent_attention = multi_latent_attention
multi_latent_attention = self.config.multi_latent_attention
if multi_latent_attention:
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
try:
self.config.multi_latent_attention = False
linear_layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
finally:
self.config.multi_latent_attention = multi_latent_attention
else:
layer_specs = super().get_transformer_layer_spec(vp_stage=vp_stage)
linear_layer_specs = layer_specs

Comment on lines +19 to +33
class LinearAttention(SelfAttention):
def __init__(
self,
config: TransformerConfig,
*args, **kwargs,
):
super().__init__(config, *args, **kwargs)

def forward(
self,
hidden_states: Tensor,
attention_mask: Tensor,
**kwargs,
) -> Tuple[Tensor, Tensor]:
return super().forward(hidden_states, attention_mask, **kwargs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The __init__ and forward methods in the LinearAttention class are redundant as they just call the superclass methods with the same arguments. You can remove them for cleaner and more concise code.

After this change, TransformerConfig and Tuple will become unused imports and should also be removed, along with other unused imports in this file (BaseInferenceContext, PackedSeqParams, Union, and SelfAttentionSubmodules).

class LinearAttention(SelfAttention):
    pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant