Proto-Adapter + CoOp Hybrid for Few-Shot Image Classification

A hybrid few-shot classifier built on a frozen CLIP ViT-B/16 backbone. It combines CoOp (Context Optimization — learnable prompt vectors) with a training-free proto-adapter (mean-feature class prototypes), and routes between them at inference time using a confidence-based switch. The goal is to specialize on seen classes without giving up CLIP's zero-shot generalization to unseen ones.

Originally built as a deep-learning course project with Christian Li Sivertsen, instructions how to run below for reproducibilty.

Results

Evaluated on the Flowers102 base/novel split (51 base classes, 51 novel classes, 10-shot). Hybrid numbers are reported as mean ± std over three seeds.

Model	Base Acc	Novel Acc	Harmonic Mean
Zero-shot CLIP (baseline)	71.33%	78.24%	74.63%
CoOp + Proto-Adapter (ours)	91.47% ± 1.61	75.83% ± 0.77	82.92% ± 1.10

A +20.14 point gain on base classes for a −2.41 point cost on novel classes — an 8.29 point improvement on the harmonic mean used by CoOp/CoCoOp to compare base/novel methods.

How it works

At inference, every test image is scored by both branches:

CoOp branch. Replaces the hand-crafted prompt "a photo of a {class}" with a learnable context vector trained on the base classes. The CLIP text encoder produces class embeddings from this learned prompt, which are then matched against image features.
Proto-adapter branch. For each base class, the L2-normalized mean image feature over the few-shot training images becomes the class prototype. Classification is nearest-prototype.

The two branches are mixed by a weight alpha. A confidence-based switch then decides whether to trust the mix or fall back to the CoOp prompt alone: if the prototype softmax confidence is below conf_threshold, the model uses the CoOp branch only. This keeps the proto-adapter — which has no prototypes for novel classes — out of decisions where it is unlikely to help.

Hyperparameter selection

Hyperparameters are picked by a two-stage search on the validation split, not the test split:

Stage 1 is a random search over training-time hyperparameters (n_ctx, lr, epochs). Each draw runs CoOp training once.
Stage 2 is a dense grid sweep over inference-time hyperparameters (alpha, conf_threshold) on top of each Stage-1 artifact. No retraining.

The winner is the configuration with the highest harmonic mean of base and novel val accuracy, then re-trained across multiple seeds for the final test report.

Setup

git clone https://github.com/HunterBaker01/Proto-CoOp-Hybrid.git
cd Proto-CoOp-Hybrid
pip install -r requirements.txt

CUDA is recommended; the pipeline runs on CPU but CoOp training will be slow.

Running

Open proto_coop_hybrid.ipynb and execute top-to-bottom. The notebook will:

Download Flowers102 to ./data/ via torchvision.
Load CLIP ViT-B/16 (downloaded automatically on first run).
Run the two-stage hyperparameter search and the final multi-seed evaluation.
Persist per-run summaries to plots/.

Repository layout

proto_coop_hybrid.ipynb   # full pipeline + report
requirements.txt          # pinned dependency ranges

References

K. Zhou, J. Yang, C. C. Loy, Z. Liu. Learning to Prompt for Vision-Language Models. IJCV 2022. arXiv:2109.01134
R. Zhang et al. Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling. ECCV 2022. arXiv:2111.03930
A. Radford et al. Learning Transferable Visual Models From Natural Language Supervision (CLIP). ICML 2021. arXiv:2103.00020

Acknowledgements

Joint work with Christian Li Sivertsen, originally completed for a deep-learning course.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
proto_coop_hybrid.ipynb		proto_coop_hybrid.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proto-Adapter + CoOp Hybrid for Few-Shot Image Classification

Results

How it works

Hyperparameter selection

Setup

Running

Repository layout

References

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Proto-Adapter + CoOp Hybrid for Few-Shot Image Classification

Results

How it works

Hyperparameter selection

Setup

Running

Repository layout

References

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages