pruning 之后使用 无法读取模型

我尝试使用
`model = AutoModelForCausalLM.from_pretrained(args.model_path, device_map="auto", trust_remote_code=True, low_cpu_mem_usage=True）`
但是会报错
“Traceback (most recent call last):
  File "/home/ubuntu/test_scripts/benchmark_r.py", line 154, in <module>
    main()
  File "/home/ubuntu/test_scripts/benchmark_r.py", line 63, in main
    model = AutoModelForCausalLM.from_pretrained(args.model_path, device_map="auto", trust_remote_code=True, low_cpu_mem_usage=True, 
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 556, in from_pretrained
    return model_class.from_pretrained(
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3502, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3926, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([2048, 2785]) in "weight" (which has shape torch.Size([2048, 5504])), this look incorrect.
”
我也尝试了在加载的时候添加参数 ignore_mismatched_sizes=True
`model = AutoModelForCausalLM.from_pretrained(args.model_path, device_map="auto", trust_remote_code=True, low_cpu_mem_usage=True, ignore_mismatched_sizes=True）`

同样也会报错:
Some weights of QWenLMHeadModel were not initialized from the model checkpoint at /data/xxxx and are newly initialized because the shapes did not match:
- transformer.h.10.mlp.c_proj.weight: found shape torch.Size([2048, 2785]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.10.mlp.w1.weight: found shape torch.Size([2785, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.10.mlp.w2.weight: found shape torch.Size([2785, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.11.mlp.c_proj.weight: found shape torch.Size([2048, 2518]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.11.mlp.w1.weight: found shape torch.Size([2518, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.11.mlp.w2.weight: found shape torch.Size([2518, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.12.attn.c_attn.weight: found shape torch.Size([3840, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.12.attn.c_proj.weight: found shape torch.Size([2048, 1280]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.12.mlp.c_proj.weight: found shape torch.Size([2048, 2393]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.12.mlp.w1.weight: found shape torch.Size([2393, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.12.mlp.w2.weight: found shape torch.Size([2393, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.13.attn.c_attn.weight: found shape torch.Size([3072, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.13.attn.c_proj.weight: found shape torch.Size([2048, 1024]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.13.mlp.c_proj.weight: found shape torch.Size([2048, 3776]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.13.mlp.w1.weight: found shape torch.Size([3776, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.13.mlp.w2.weight: found shape torch.Size([3776, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.14.attn.c_attn.weight: found shape torch.Size([2688, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.14.attn.c_proj.weight: found shape torch.Size([2048, 896]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.14.mlp.c_proj.weight: found shape torch.Size([2048, 3594]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.14.mlp.w1.weight: found shape torch.Size([3594, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.14.mlp.w2.weight: found shape torch.Size([3594, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.15.attn.c_attn.weight: found shape torch.Size([3072, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.15.attn.c_proj.weight: found shape torch.Size([2048, 1024]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.15.mlp.c_proj.weight: found shape torch.Size([2048, 4113]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.15.mlp.w1.weight: found shape torch.Size([4113, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.15.mlp.w2.weight: found shape torch.Size([4113, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.16.attn.c_attn.weight: found shape torch.Size([3072, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.16.attn.c_proj.weight: found shape torch.Size([2048, 1024]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.17.attn.c_attn.weight: found shape torch.Size([2688, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.17.attn.c_proj.weight: found shape torch.Size([2048, 896]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.17.mlp.c_proj.weight: found shape torch.Size([2048, 3263]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.17.mlp.w1.weight: found shape torch.Size([3263, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.17.mlp.w2.weight: found shape torch.Size([3263, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.18.mlp.c_proj.weight: found shape torch.Size([2048, 3861]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.18.mlp.w2.weight: found shape torch.Size([3861, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.18.attn.c_attn.weight: found shape torch.Size([1536, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.18.attn.c_proj.weight: found shape torch.Size([2048, 512]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.18.mlp.w1.weight: found shape torch.Size([3861, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.19.attn.c_attn.weight: found shape torch.Size([2688, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.19.attn.c_proj.weight: found shape torch.Size([2048, 896]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.20.attn.c_attn.weight: found shape torch.Size([2688, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.20.attn.c_proj.weight: found shape torch.Size([2048, 896]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.20.mlp.c_proj.weight: found shape torch.Size([2048, 3291]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.20.mlp.w1.weight: found shape torch.Size([3291, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.20.mlp.w2.weight: found shape torch.Size([3291, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.21.attn.c_attn.weight: found shape torch.Size([1536, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.21.attn.c_proj.weight: found shape torch.Size([2048, 512]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.22.attn.c_attn.weight: found shape torch.Size([3072, 2048]) in the checkpoint and torch.Size([6144, 2048]) in the model instantiated
- transformer.h.22.attn.c_proj.weight: found shape torch.Size([2048, 1024]) in the checkpoint and torch.Size([2048, 2048]) in the model instantiated
- transformer.h.9.mlp.c_proj.weight: found shape torch.Size([2048, 2630]) in the checkpoint and torch.Size([2048, 5504]) in the model instantiated
- transformer.h.9.mlp.w1.weight: found shape torch.Size([2630, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
- transformer.h.9.mlp.w2.weight: found shape torch.Size([2630, 2048]) in the checkpoint and torch.Size([5504, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "/home/ubuntu/test_scripts/benchmark_r.py", line 152, in <module>
    main()
  File "/home/ubuntu/test_scripts/benchmark_r.py", line 63, in main
    model = AutoModelForCausalLM.from_pretrained(args.model_path, device_map="auto", trust_remote_code=True, low_cpu_mem_usage=True, ignore_mismatched_sizes=True)
  File "/home/ubuntu/miniconda3/envs/qwen/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 556, in from_pretrained
    return model_class.from_pretrained(
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3558, in from_pretrained
    dispatch_model(model, **device_map_kwargs)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/accelerate/big_modeling.py", line 474, in dispatch_model
    model.to(device)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2556, in to
    return super().to(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
    return self._apply(convert)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
    param_applied = fn(param)
  File "/home/ubuntu/miniconda3/envs/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

请问你们在prune模型之后是怎么去加载的呢。
很着急尝试FLAP，期待您的回复，谢谢。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pruning 之后使用无法读取模型 #7

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

pruning 之后使用 无法读取模型 #7

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

pruning 之后使用无法读取模型 #7