Skip to content

[models] chore: Change transformers v5 support for qwen3_moe to use HF v5 style expert weight layout and add a converter impl.#500

Draft
piyifan123 wants to merge 3 commits into
mainfrom
piyifan/qwen3-moe-v5-merged-weights
Draft

[models] chore: Change transformers v5 support for qwen3_moe to use HF v5 style expert weight layout and add a converter impl.#500
piyifan123 wants to merge 3 commits into
mainfrom
piyifan/qwen3-moe-v5-merged-weights

Conversation

@piyifan123

@piyifan123 piyifan123 commented Feb 24, 2026

Copy link
Copy Markdown
Collaborator

Training Loss check

image

@github-actions github-actions Bot added hf_v5 Related for transformers v5 misc Every misc labels Feb 24, 2026
@piyifan123 piyifan123 marked this pull request as draft February 24, 2026 01:58

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a robust mechanism for converting HuggingFace Transformers v5 style expert weight layouts into a merged format suitable for fused MoE kernels. The implementation includes a protocol-based converter system integrated into the model loading utilities and a specific implementation for the Qwen3Moe model. The changes are well-tested and follow existing patterns in the repository. I have identified a few high-severity issues related to potential key mapping conflicts and redundant logic that should be addressed to ensure reliability across different model configurations.

Comment on lines 312 to +313
name = _convert_weight_key(name, model)
converted = maybe_convert_hf_checkpoint_tensor(name, tensor, converter)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The weight key conversion _convert_weight_key is performed before the HF checkpoint tensor converter. If a model uses _checkpoint_conversion_mapping to rename the model prefix (e.g., to language_model), the regex in Qwen3MoeV5CheckpointTensorConverter will fail to match the keys. While Qwen3Moe might not currently use this mapping, this order of operations makes the converter fragile for models that do. Consider if the converter should handle the mapped keys or if the order should be reversed.

Comment on lines +79 to +81
match = _QWEN3_MOE_EXPERT_KEY.match(name)
if match is None:
return HfConvertedCheckpointTensor(name=name, tensor=tensor)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This check is redundant because maybe_convert_hf_checkpoint_tensor already calls can_handle(name) before invoking convert. If can_handle returns False, convert is never called. Removing this redundancy simplifies the method.

        match = _QWEN3_MOE_EXPERT_KEY.match(name)
        # No need for match is None check here as can_handle already verified it

Comment on lines +112 to +113
if not hasattr(converter, "can_handle") or not hasattr(converter, "convert"):
logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no `can_handle/convert`.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The validation only checks for the presence of attributes but not if they are actually callable. It is safer to verify that can_handle and convert are indeed methods/functions to avoid runtime errors when they are invoked later.

Suggested change
if not hasattr(converter, "can_handle") or not hasattr(converter, "convert"):
logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no `can_handle/convert`.")
if not callable(getattr(converter, "can_handle", None)) or not callable(getattr(converter, "convert", None)):
logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no callable `can_handle/convert`.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hf_v5 Related for transformers v5 misc Every misc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant