Skip to content

Remove unnecessary load_weights methods#44589

Open
hmellor wants to merge 27 commits into
vllm-project:mainfrom
hmellor:remove-simple-load-weights
Open

Remove unnecessary load_weights methods#44589
hmellor wants to merge 27 commits into
vllm-project:mainfrom
hmellor:remove-simple-load-weights

Conversation

@hmellor

@hmellor hmellor commented Jun 4, 2026

Copy link
Copy Markdown
Member

This PR adds missing functionality to AutoWeightsLoader which allows us to delete the load_weights method boilerplate from 41 architectures in vLLM. Every one of these architectures can automatically load:

  • GPTQ checkpoints with correct bias skipping
  • FP8 checkpoints with various scale formats
  • Checkpoints with fused or sharded qkv_proj/gate_up_proj weights
  • LoRA weights

The specific changes are:

  • Enables MergedColumnParallelLinear and QKVParallelLinear to load themselves from fused or unfused checkpoints without any special logic provided that the checkpoint weights are mapped correctly
    • The mappings for qkv_proj look like this, which maps checkpoint name to a shard in QKVParallelLinear:
      hf_to_vllm_mapper = WeightsMapper(
        orig_to_new_substr={
            ".q_proj": ".qkv_proj.q",
            ".k_proj": ".qkv_proj.k",
            ".v_proj": ".qkv_proj.v",
        }
      )
    • The mappings for gate_up_proj look like this, which maps checkpoint name to a shard in MergedColumnParallelLinear:
      hf_to_vllm_mapper = WeightsMapper(
          orig_to_new_substr={
              ".gate_proj": ".gate_up_proj.0",
              ".up_proj": ".gate_up_proj.1",
          }
      )
  • Update ColumnParallelLinearWithLoRA, MergedColumnParallelLinearWithLoRA, QKVParallelLinearWithLoRA and MergedQKVParallelLinearWithLoRA to work when packed_modules_mapping no longer exists as a class variable of the model
  • Updates QuantizationConfig to include the mappings and skip unexpecteds from maybe_remap_kv_scale_name (this function must stay until all models can use AutoWeightsLoader)
  • Add unexpected GPTQ bias skipping to AutoWeightsLoader

This change actually found a latent bug in the layerwise online-quantization accounting:

  1. Fp8OnlineLinearMethod.create_weights calls initialize_online_processing(layer), which snapshots load_numel_total = get_layer_size(layer) and wraps the weight loaders of tensors that exist at that moment. But ColumnParallelLinear.__init__ registers self.bias after create_weights returns — so the bias was excluded from the expected total and its loader never wrapped.
  2. For OPT (qkv biases), the counter therefore hits the weight-only total at the last weight shard, and _layerwise_process finalizes the layer before the q bias arrives: it materializes the meta weight, replays the buffered shards, quantizes, and runs the Marlin prep — which permutes the bias for the kernel epilogue and replaces the param with a bare Parameter.
  3. The trailing q-bias load then hits that post-processed param (no output_dim, shape [2304] vs [768]) → the assert in QKVParallelLinear.weight_loader.

On main, OPT's old dict-based loader held a stale params_dict snapshot, so the late bias write went into the dead pre-Marlin tensor — silent corruption that the test never caught (it only checks dtypes, explicitly not accuracy). The branch's delegation path fetches the live param at load time, which turned the silent bug into a crash.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@hmellor hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 4, 2026
@mergify mergify Bot added deepseek Related to DeepSeek models llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models speculative-decoding labels Jun 4, 2026
@hmellor hmellor requested a review from jeejeelee as a code owner June 5, 2026 10:39
@mergify

mergify Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hmellor.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 5, 2026
hmellor added 4 commits June 5, 2026 12:29
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor force-pushed the remove-simple-load-weights branch from 9eeec97 to a9788ab Compare June 5, 2026 12:30
@mergify mergify Bot removed the needs-rebase label Jun 5, 2026
@hmellor hmellor changed the title Remove simple load_weights methods Remove unnecessary load_weights methods Jun 5, 2026
hmellor added 3 commits June 5, 2026 14:57
…get_cache_scale_mapper`

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@mergify

mergify Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hmellor.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 8, 2026
hmellor added 3 commits June 11, 2026 10:03
…weights

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@mergify mergify Bot removed the needs-rebase label Jun 11, 2026
hmellor added 6 commits June 11, 2026 12:52
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@mergify mergify Bot added the mistral Related to Mistral models label Jun 11, 2026
@mergify

mergify Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hmellor.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 13, 2026
hmellor added 6 commits June 13, 2026 13:24
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…_to_vllm_mapper` too

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@mergify mergify Bot removed the needs-rebase label Jun 13, 2026
hmellor added 4 commits June 13, 2026 15:12
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models gpt-oss Related to GPT-OSS models llama Related to Llama models mistral Related to Mistral models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

1 participant