Remove unnecessary load_weights methods#44589
Open
hmellor wants to merge 27 commits into
Open
Conversation
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
9eeec97 to
a9788ab
Compare
load_weights methodsload_weights methods
…get_cache_scale_mapper` Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
…weights Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…_to_vllm_mapper` too Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds missing functionality to
AutoWeightsLoaderwhich allows us to delete theload_weightsmethod boilerplate from 41 architectures in vLLM. Every one of these architectures can automatically load:qkv_proj/gate_up_projweightsThe specific changes are:
MergedColumnParallelLinearandQKVParallelLinearto load themselves from fused or unfused checkpoints without any special logic provided that the checkpoint weights are mapped correctlyqkv_projlook like this, which maps checkpoint name to a shard inQKVParallelLinear:gate_up_projlook like this, which maps checkpoint name to a shard inMergedColumnParallelLinear:ColumnParallelLinearWithLoRA,MergedColumnParallelLinearWithLoRA,QKVParallelLinearWithLoRAandMergedQKVParallelLinearWithLoRAto work whenpacked_modules_mappingno longer exists as a class variable of the modelQuantizationConfigto include the mappings and skip unexpecteds frommaybe_remap_kv_scale_name(this function must stay until all models can useAutoWeightsLoader)AutoWeightsLoaderThis change actually found a latent bug in the layerwise online-quantization accounting:
Fp8OnlineLinearMethod.create_weightscallsinitialize_online_processing(layer), which snapshotsload_numel_total = get_layer_size(layer)and wraps the weight loaders of tensors that exist at that moment. ButColumnParallelLinear.__init__registersself.biasaftercreate_weightsreturns — so the bias was excluded from the expected total and its loader never wrapped._layerwise_processfinalizes the layer before the q bias arrives: it materializes the meta weight, replays the buffered shards, quantizes, and runs the Marlin prep — which permutes the bias for the kernel epilogue and replaces the param with a bareParameter.output_dim, shape [2304] vs [768]) → the assert inQKVParallelLinear.weight_loader.On main, OPT's old dict-based loader held a stale
params_dictsnapshot, so the late bias write went into the dead pre-Marlin tensor — silent corruption that the test never caught (it only checks dtypes, explicitly not accuracy). The branch's delegation path fetches the live param at load time, which turned the silent bug into a crash.