Feature Request: Visual Prompt Tuning Support

## Summary

I would like to propose adding support for **Visual Prompt Tuning** to the prompt-tuning library. This technique extends prompt tuning to vision and vision-language models, enabling parameter-efficient adaptation of visual encoders and multimodal models.

## Motivation

Visual prompt tuning brings the benefits of parameter-efficient fine-tuning to computer vision and multimodal domains. This is particularly valuable for adapting large vision-language models (like CLIP) to downstream tasks without fine-tuning the entire model.

## Proposed Implementation

The implementation would include:

1. **Pixel-space prompts**: Add trainable prompts as image borders, corners, or patches
2. **Patch-level prompts**: Insert prompts at the patch embedding level for Vision Transformers
3. **Deep visual prompts**: Add prompts at multiple layers of the visual encoder
4. **Vision-language coordination**: Coordinate visual and textual prompts for multimodal models
5. **Adaptive visual prompts**: Condition prompts on image content

## Key Features

- Multiple visual prompt insertion strategies
- Support for Vision Transformers (ViT)
- Coordination with textual prompts for VL models
- Adaptive prompts based on image features
- Integration with existing T5X/Flaxformer infrastructure

## References

- Jia et al. (2022). "Visual Prompt Tuning." [arXiv:2203.12119](https://arxiv.org/abs/2203.12119)
- Zhou et al. (2022). "Learning to Prompt for Vision-Language Models." [arXiv:2109.01134](https://arxiv.org/abs/2109.01134)

## Additional Context

I have implemented a prototype of this technique that follows the library's design patterns and coding standards. The implementation is available in my fork at https://github.com/hwilner/prompt-tuning

Would the maintainers be interested in this enhancement? I'm happy to discuss the design and implementation details further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Visual Prompt Tuning Support #307

Summary

Motivation

Proposed Implementation

Key Features

References

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Visual Prompt Tuning Support #307

Description

Summary

Motivation

Proposed Implementation

Key Features

References

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions