VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks
[](#)
This repository contains the official implementation of the paper: "VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks".
This repository provides a PyTorch Lightning + Hydra codebase for distilling knowledge from large Vision-Language Models (VLMs) (e.g., CLIP-like models) to smaller, resource-efficient neural networks. By leveraging multi-modal embeddings (visual & textual) from a frozen VLM, VL2Lite significantly boosts performance in fine-grained classification tasks without extra teacher fine-tuning overhead.
- Frozen VLM Teacher: No teacher fine-tuning required
- Condensation Layers: Reduce dimensionality for both image and text embeddings
- Multi-Loss: Classification loss + Visual KD + Linguistic KD
- Dynamic Weighting: Gradually shifts from KD to classification emphasis
- Configurable: Hydra-based setup for custom data, model, or experiment scripts
-
Clone project:
git clone https://github.com/jsjangAI/VL2Lite cd vl2lite -
(Optional) Create conda environment:
conda create -n vl2lite_env python=3.9 conda activate vl2lite_env
-
Install PyTorch per official instructions.
-
Install requirements:
pip install -r requirements.txt
If your dataset is in a different path, create a soft link:
ln -s ./data/kd_datasets /data/KD_datasetsUpdate configs/data/ if needed.
We use PyTorch Lightning for training loops and Hydra for configuration.
-
Train on CPU:
python src/train.py trainer=cpu
-
Train on GPU:
python src/train.py trainer=gpu
Pick a config from configs/experiment/:
python src/train.py experiment=experiment_nameand you can override any parameter:
python src/train.py trainer.max_epochs=20 data.batch_size=64(See src/train.sh for an example script.)
- trainer configs in
configs/trainer/ - data configs in
configs/data/ - model configs in
configs/model/ - experiment configs in
configs/experiment/
Hydra allows combining or overriding these configs easily.
Built upon the Lightning-Hydra-Template.
We thank open-source projects (PyTorch, Lightning, Hydra) that enable this work.
If you use this code or find VL2Lite helpful, please cite:
@misc{jang2025vl2lite,
title={VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks},
author={Jang, Jinseong and Ma, Chunfei and Lee, Byeongwon},
journal={CVPR},
year={2025}
}This project is licensed under the MIT License. Please see the LICENSE file for details.