Skip to content

How to allow img_size greater than 256 for DINOv2Encoder #13

@faymek

Description

@faymek

Thanks for your inspiring work.

The DINOv2Encoder and DINOv2Decoder use pretrained ViT models, which limits the input resolution to 256x256. How to allow larger resolution for inference?

  • If I split input image into 256 patches, the reconstruction will have block atrifacts.
    encoder = DINOv2Encoder(model_name='vit_small_patch14_dinov2.lvd142m',
                            model_kwargs={'img_size': 256, 'patch_size': 16, 'drop_path_rate': 0.0},
                            tuning_method='lat_lora', tuning_kwargs={'r': 8},
                            num_latent_tokens=32)
    decoder = DINOv2Decoder(model_name='vit_small_patch14_dinov2.lvd142m',
                            model_kwargs={'img_size': 256, 'patch_size': 16, 'drop_path_rate': 0.0},
                            tuning_method='full', tuning_kwargs={'r': 8}, num_latent_tokens=32, use_rope=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions