[models] chore: Change transformers v5 support for qwen3_moe to use HF v5 style expert weight layout and add a converter impl.#500
Conversation
There was a problem hiding this comment.
Code Review
The pull request introduces a robust mechanism for converting HuggingFace Transformers v5 style expert weight layouts into a merged format suitable for fused MoE kernels. The implementation includes a protocol-based converter system integrated into the model loading utilities and a specific implementation for the Qwen3Moe model. The changes are well-tested and follow existing patterns in the repository. I have identified a few high-severity issues related to potential key mapping conflicts and redundant logic that should be addressed to ensure reliability across different model configurations.
| name = _convert_weight_key(name, model) | ||
| converted = maybe_convert_hf_checkpoint_tensor(name, tensor, converter) |
There was a problem hiding this comment.
The weight key conversion _convert_weight_key is performed before the HF checkpoint tensor converter. If a model uses _checkpoint_conversion_mapping to rename the model prefix (e.g., to language_model), the regex in Qwen3MoeV5CheckpointTensorConverter will fail to match the keys. While Qwen3Moe might not currently use this mapping, this order of operations makes the converter fragile for models that do. Consider if the converter should handle the mapped keys or if the order should be reversed.
| match = _QWEN3_MOE_EXPERT_KEY.match(name) | ||
| if match is None: | ||
| return HfConvertedCheckpointTensor(name=name, tensor=tensor) |
There was a problem hiding this comment.
This check is redundant because maybe_convert_hf_checkpoint_tensor already calls can_handle(name) before invoking convert. If can_handle returns False, convert is never called. Removing this redundancy simplifies the method.
match = _QWEN3_MOE_EXPERT_KEY.match(name)
# No need for match is None check here as can_handle already verified it| if not hasattr(converter, "can_handle") or not hasattr(converter, "convert"): | ||
| logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no `can_handle/convert`.") |
There was a problem hiding this comment.
The validation only checks for the presence of attributes but not if they are actually callable. It is safer to verify that can_handle and convert are indeed methods/functions to avoid runtime errors when they are invoked later.
| if not hasattr(converter, "can_handle") or not hasattr(converter, "convert"): | |
| logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no `can_handle/convert`.") | |
| if not callable(getattr(converter, "can_handle", None)) or not callable(getattr(converter, "convert", None)): | |
| logger.warning_rank0("Ignore invalid checkpoint tensor converter because it has no callable `can_handle/convert`.") |
Training Loss check