Skip to content

jsjangAI/VL2Lite

Repository files navigation


VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks

PyTorch Lightning Config: Hydra Template
[![Conference](http://img.shields.io/badge/CVPR%202025-Paper-4b44ce.svg)](#)

Description

This repository contains the official implementation of the paper: "VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks".

This repository provides a PyTorch Lightning + Hydra codebase for distilling knowledge from large Vision-Language Models (VLMs) (e.g., CLIP-like models) to smaller, resource-efficient neural networks. By leveraging multi-modal embeddings (visual & textual) from a frozen VLM, VL2Lite significantly boosts performance in fine-grained classification tasks without extra teacher fine-tuning overhead.


Features

  • Frozen VLM Teacher: No teacher fine-tuning required
  • Condensation Layers: Reduce dimensionality for both image and text embeddings
  • Multi-Loss: Classification loss + Visual KD + Linguistic KD
  • Dynamic Weighting: Gradually shifts from KD to classification emphasis
  • Configurable: Hydra-based setup for custom data, model, or experiment scripts

Installation

  1. Clone project:

    git clone https://github.com/jsjangAI/VL2Lite
    cd vl2lite
  2. (Optional) Create conda environment:

    conda create -n vl2lite_env python=3.9
    conda activate vl2lite_env
  3. Install PyTorch per official instructions.

  4. Install requirements:

    pip install -r requirements.txt

Data Setup

If your dataset is in a different path, create a soft link:

ln -s ./data/kd_datasets /data/KD_datasets

Update configs/data/ if needed.


How to Run

We use PyTorch Lightning for training loops and Hydra for configuration.

Basic Commands

  • Train on CPU:

    python src/train.py trainer=cpu
  • Train on GPU:

    python src/train.py trainer=gpu

Experiment Configs

Pick a config from configs/experiment/:

python src/train.py experiment=experiment_name

and you can override any parameter:

python src/train.py trainer.max_epochs=20 data.batch_size=64

(See src/train.sh for an example script.)


Configuration

  • trainer configs in configs/trainer/
  • data configs in configs/data/
  • model configs in configs/model/
  • experiment configs in configs/experiment/

Hydra allows combining or overriding these configs easily.


Acknowledgments

Built upon the Lightning-Hydra-Template.
We thank open-source projects (PyTorch, Lightning, Hydra) that enable this work.


Citation

If you use this code or find VL2Lite helpful, please cite:

@misc{jang2025vl2lite,
  title={VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks},
  author={Jang, Jinseong and Ma, Chunfei and Lee, Byeongwon},
  journal={CVPR},
  year={2025}
}

License

This project is licensed under the MIT License. Please see the LICENSE file for details.

About

This repository contains the **official implementation** of the paper: "VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages