[models] chore: Change transformers v5 support for qwen3_moe to use HF v5 style expert weight layout and add a converter impl. by piyifan123 · Pull Request #500 · ByteDance-Seed/VeOmni

piyifan123 · 2026-02-24T01:58:35Z

Training Loss check

gemini-code-assist

Code Review

The pull request introduces a robust mechanism for converting HuggingFace Transformers v5 style expert weight layouts into a merged format suitable for fused MoE kernels. The implementation includes a protocol-based converter system integrated into the model loading utilities and a specific implementation for the Qwen3Moe model. The changes are well-tested and follow existing patterns in the repository. I have identified a few high-severity issues related to potential key mapping conflicts and redundant logic that should be addressed to ensure reliability across different model configurations.

gemini-code-assist · 2026-02-24T02:03:22Z

            name = _convert_weight_key(name, model)
+            converted = maybe_convert_hf_checkpoint_tensor(name, tensor, converter)


The weight key conversion _convert_weight_key is performed before the HF checkpoint tensor converter. If a model uses _checkpoint_conversion_mapping to rename the model prefix (e.g., to language_model), the regex in Qwen3MoeV5CheckpointTensorConverter will fail to match the keys. While Qwen3Moe might not currently use this mapping, this order of operations makes the converter fragile for models that do. Consider if the converter should handle the mapped keys or if the order should be reversed.

gemini-code-assist · 2026-02-24T02:03:22Z

+        match = _QWEN3_MOE_EXPERT_KEY.match(name)
+        if match is None:
+            return HfConvertedCheckpointTensor(name=name, tensor=tensor)


This check is redundant because maybe_convert_hf_checkpoint_tensor already calls can_handle(name) before invoking convert. If can_handle returns False, convert is never called. Removing this redundancy simplifies the method.

match = _QWEN3_MOE_EXPERT_KEY.match(name) # No need for match is None check here as can_handle already verified it

gemini-code-assist · 2026-02-24T02:03:23Z

+    if not hasattr(converter, "can_handle") or not hasattr(converter, "convert"):
+        logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no `can_handle/convert`.")


The validation only checks for the presence of attributes but not if they are actually callable. It is safer to verify that can_handle and convert are indeed methods/functions to avoid runtime errors when they are invoked later.

Suggested change

if not hasattr(converter, "can_handle") or not hasattr(converter, "convert"):

logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no `can_handle/convert`.")

if not callable(getattr(converter, "can_handle", None)) or not callable(getattr(converter, "convert", None)):

logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no callable `can_handle/convert`.")

initial change

db6da55

github-actions Bot added hf_v5 Related for transformers v5 misc Every misc labels Feb 24, 2026

piyifan123 marked this pull request as draft February 24, 2026 01:58

gemini-code-assist Bot reviewed Feb 24, 2026

View reviewed changes

yifan.pi added 2 commits February 24, 2026 03:37

patch qwen3_moe impl

d60846c

update doc

5b06aad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[models] chore: Change transformers v5 support for qwen3_moe to use HF v5 style expert weight layout and add a converter impl.#500

[models] chore: Change transformers v5 support for qwen3_moe to use HF v5 style expert weight layout and add a converter impl.#500
piyifan123 wants to merge 3 commits into
mainfrom
piyifan/qwen3-moe-v5-merged-weights

piyifan123 commented Feb 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 24, 2026

Uh oh!

gemini-code-assist Bot Feb 24, 2026

Uh oh!

gemini-code-assist Bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		name = _convert_weight_key(name, model)
		converted = maybe_convert_hf_checkpoint_tensor(name, tensor, converter)

		if not hasattr(converter, "can_handle") or not hasattr(converter, "convert"):
		logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no `can_handle/convert`.")

Conversation

piyifan123 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Training Loss check

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

piyifan123 commented Feb 24, 2026 •

edited

Loading