How to allow img_size greater than 256 for DINOv2Encoder

Thanks for your inspiring work.

The `DINOv2Encoder` and `DINOv2Decoder` use pretrained ViT models, which limits the input resolution to 256x256. How to allow larger resolution for inference?

- If I split input image into 256 patches, the reconstruction will have block atrifacts.

```python
    encoder = DINOv2Encoder(model_name='vit_small_patch14_dinov2.lvd142m',
                            model_kwargs={'img_size': 256, 'patch_size': 16, 'drop_path_rate': 0.0},
                            tuning_method='lat_lora', tuning_kwargs={'r': 8},
                            num_latent_tokens=32)
    decoder = DINOv2Decoder(model_name='vit_small_patch14_dinov2.lvd142m',
                            model_kwargs={'img_size': 256, 'patch_size': 16, 'drop_path_rate': 0.0},
                            tuning_method='full', tuning_kwargs={'r': 8}, num_latent_tokens=32, use_rope=True)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to allow img_size greater than 256 for DINOv2Encoder #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to allow img_size greater than 256 for DINOv2Encoder #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions