mat1 and mat2 shapes cannot be multiplied (1152x2688 and 256x3584)

Hello, After following the aforementioned steps for deleting files, when executing ”train_video_qwen.sh“.I still encountered problems.

I encountered the error :

```
mat1 and mat2 shapes cannot be multiplied (1152x2688 and 256x3584)
```

when executing” train_image_qwen.sh“.The following errors occurred, but the model file was still retained.

```
CollectiveFingerPrint
(SequenceNumber=57835, OpType=ALLGATHER, TensorShape=[0], TensorDtypes=Float, 
TensorDeviceTypes=TensorOptions(dtype=float (default), device=cuda,
layout=Strided (default), requires_grad=false (default), 
pinned_memory=false (default), memory_format=(nullopt))). 
Collectives differ in the following aspects: 
Tensor Tensor shapes: 544997376vs 0
```

The dataset I used was provided by GitHub.I made the modification in the code. If the image file cannot be found, it will be skipped.The video dataset I'm using is shenxq/VideoChat2. Could it be that the dataset I downloaded is incorrect?

However, the model files were retained. I deleted the safetensor files as described above, but the image model files cannot be used to infer videos. Then, when I executed “train_video_qwen.sh”, the following errors occurred:

```
0%|                                                                                                                                                                                                                 | 0/165311 [00:00<?, ?it/s]Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/gifs/tumblr_nodpeypt1E1tqwtb6o1_500.gif
WARNING: tokenization mismatch: 85 vs. 87. (ignored)
Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/11558840
WARNING: tokenization mismatch: 80 vs. 81. (ignored)
Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/Street_View_Indoor/video3_00350_00360.mp4
WARNING: tokenization mismatch: 75 vs. 76. (ignored)
Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/didemo/44012166@N05_4174564176_52c774f2ab.mp4
WARNING: tokenization mismatch: 74 vs. 76. (ignored)
Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/11592197
Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/1010400251
WARNING: tokenization mismatch: 56 vs. 57. (ignored)
Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/didemo/7964800@N03_6072995822_212c640508.mp4
WARNING: tokenization mismatch: 64 vs. 65. (ignored)
Video file not found, skipping: /data/zm_doc/datasets_MLLM/VideoChat2/HXshtf_6lmI
WARNING: tokenization mismatch: 71 vs. 73. (ignored)
WARNING: tokenization mismatch: 79 vs. 80. (ignored)
Traceback (most recent call last):

 File "/workspace/LongVU/longvu/cambrian_arch.py", line 1033, in prepare_inputs_labels_for_multimodal
 ) = self.prepare_inputs_labels_for_multimodal(
 image_features = self.get_model().mm_projector(image_features).to(dtype)
 File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
 File "/workspace/LongVU/longvu/cambrian_arch.py", line 1033, in prepare_inputs_labels_for_multimodal
 return self._call_impl(*args, **kwargs)
 File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
 return forward_call(*args, **kwargs)
 File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
 image_features = self.get_model().mm_projector(image_features).to(dtype)
 File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
 return self._call_impl(*args, **kwargs)
 File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
 return forward_call(*args, **kwargs)
 File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
 return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1152x2688 and 256x3584)
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1152x2688 and 256x3584)
```

train_image_qwen.sh

```
PREV_STAGE_CHECKPOINT="/workspace/LongVU/checkpoints/cambrian_qwen/checkpoint-500"
PATH_TO_JSON="/data/zm_doc/datasets_MLLM/VideoChat2/train_video_data.json"
PATH_TO_FOLDER="/data/zm_doc/datasets_MLLM/VideoChat2/"
VERSION="qwen"

CUDA_LAUNCH_BLOCKING=1 TORCH_DISTRIBUTED_DEBUG=DETAIL PYTHONPATH=/workspace/LongVU   torchrun --nproc_per_node=2 --nnodes=1 \
longvu/train.py \
--output_dir "/tmp/longvu/" \
--input_model_filename $PREV_STAGE_CHECKPOINT \
--output_model_filename "./checkpoints/cambrian_vl_qwen/" \
--data_path $PATH_TO_JSON \
--image_folder $PATH_TO_FOLDER \
--model_max_length 8192 \
--fp16 False \
--bf16 True \
--log_on_each_node False \
--logging_dir /tmp/llava/test/ \
--num_train_epochs 1 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 1 \
--save_steps 30 \
--eval_steps 30 \
--logging_steps 10 \
--evaluation_strategy "no" \
--save_strategy "steps" \
#--report_to "tensorboard" \
--save_total_limit 1 \
--learning_rate 5e-6 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--tf32 False \
--version $VERSION \
--mm_vision_select_layer "-2" \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--group_by_modality_length True \
--dataloader_num_workers 0 \
--lazy_preprocess False \
--tune_mm_mlp_adapter False \
--freeze_mm_mlp_adapter False \
--freeze_backbone False \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'Qwen2DecoderLayer' \
--gradient_checkpointing True \
--mm_projector_type sva \
--image_token_len 144 \
--query_num_list "[144]" \
--resume True \
--lowres_token 8 \
--video_fps 1 \
--highres True \
--drop_threshold 0.8 \
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mat1 and mat2 shapes cannot be multiplied (1152x2688 and 256x3584) #51

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

mat1 and mat2 shapes cannot be multiplied (1152x2688 and 256x3584) #51

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions