Read this in Chinese: README_zh.md
Downstream tasks based on Meta AI's DINOv3 model, focusing on classification and segmentation. This project integrates the full DINOv3 source code, implements multiple architectural variants, and supports both natural image analysis and medical image processing.
- Complete DINOv3 integration: includes the official DINOv3 source code and supports multiple pretrained models (vits16, vitb16, vitl16, vit7b16)
- Multiple architectures: implementations of UNet, DPT, FAPM segmentation architectures and a linear classifier
- Task diversity: supports classification (ImageNette), natural image segmentation (ADE20K), and medical image segmentation
- Unified interface: a unified training and inference framework, with flexible switching driven by config files
βββ dinov3/ # Full DINOv3 source code
β βββ models/ # DINOv3 model implementations
β βββ data/ # Official data loaders
β βββ eval/ # Official evaluation scripts
β βββ configs/ # Official DINOv3 configs
βββ models/ # Downstream task models
β βββ backbones.py # Unified DINOv3 backbone loader
β βββ dinov3_unet.py # DINOv3-UNet segmentation model
β βββ dinov3_seg_dpt.py # DINOv3-DPT segmentation model
β βββ dinov3_unet_fapm.py # DINOv3-FAPM advanced segmentation model
β βββ dinov3_linear_cls.py # DINOv3 linear classifier
βββ data/ # Dataset loaders
β βββ Dataset_ADE20k.py # ADE20K segmentation dataset
β βββ Dataset_Imagenette2.py # ImageNette classification dataset
β βββ dinov3_transforms.py # Official DINOv3 transforms
βββ configs/ # Task configs
β βββ classification_imagenette.yaml
β βββ segmentation_ade20k.yaml
βββ train_classifier.py # Classification training script
βββ train_segmentor.py # Segmentation training script
βββ inference_classifier.py # Classification inference script
βββ inference_segmentor.py # Segmentation inference script
pip install -r requirements.txtCore dependencies:
- torch, torchvision
- timm, PyYAML, tqdm
- einops, scikit-learn
Download official DINOv3 pretrained weights into the checkpoints/ directory:
mkdir checkpoints
# Download the required pretrained weight files into this directory
# dinov3_vits16_pretrain_lvd1689m-08c60483.pth
# dinov3_vitb16_pretrain_lvd1689m-73cec8be.pth
# dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth
# dinov3_vit7b16_pretrain_lvd1689m-a955f4ea.pth- ImageNette2: 10-class image classification
- Feature extraction: cls token, patch average, or their combination
- ADE20K: 150-class scene segmentation
- MRI Head 2D: binary medical image segmentation
- Architectures: UNet, DPT, FAPM
ImageNette2:
python data/download_imagenette2.pyADE20K:
python data/download_ade20k.pyImage classification:
python train_classifier.py --config configs/classification_imagenette.yamlImage segmentation:
python train_segmentor.py --config configs/segmentation_ade20k.yamlClassification inference:
python inference_classifier.py --checkpoint output/xxx/checkpoint.pth --image path/to/image.jpgSegmentation inference:
python inference_segmentor.py --checkpoint output/xxx/checkpoint.pth --image path/to/image.jpgYAML-based configuration system with flexible model and training hyperparameters.
- Unified pretrained model loading interface (
models/backbones.py) - Supports vits16, vitb16, vitl16, vit7b16 variants
- Local checkpoint management; backbone params frozen by default
DinoV3_UNet (models/dinov3_unet.py) β a custom UNet-like fusion design that achieved the best segmentation accuracy in our tests with a small parameter count:
- Simple UNet architecture with multi-level feature fusion
- Suited for standard segmentation tasks
DinoV3_DPT (models/dinov3_seg_dpt.py) β Paper: https://arxiv.org/abs/2509.00833v1
- Dense Prediction Transformer architecture
- Based on feature projection and fusion
DinoV3_FAPM (models/dinov3_unet_fapm.py) β Paper: https://arxiv.org/abs/2508.20909v1
- Feature Alignment Pyramid Module
- Supports multi-scale segmentation for complex scenes
DinoV3LinearClassifier (models/dinov3_linear_cls.py):
- Linear classification head supporting multiple feature extraction modes
- Feature sources: cls token, patch average, or both
- Config loading: load all training params from YAML
- Data preparation: automatically selects dataset and transforms
- Model building: dynamically choose architecture per config
- Training loop: unified engine with checkpoint save/restore
- Metrics:
- Classification: Top-1/Top-5 accuracy
- Segmentation: mIoU (mean Intersection over Union)
Quick inference for a single image with visualized outputs.
Batch multiple images to improve throughput.
Chunked batch inference utilities tailored for medical images.
- RandomResizedCrop
- RandomHorizontalFlip
- Normalization (ImageNet stats)
- MRI: FixedGamma(0.75) for low-intensity enhancement
- Natural images: standard ImageNet preprocessing
- Memory: use
torch.no_grad()andmodel.eval()during inference - Batching: support batched inference to utilize GPU
- Frozen params: freeze DINOv3 backbone and train task heads only
- AMP: optional automatic mixed precision training
- vits16: lightest, good for quick validation and constrained environments
- vitb16: balanced performance and efficiency
- vitl16: higher performance, requires more compute
- vit7b16: best performance, requires substantial compute
- UNet: simple and efficient for standard tasks
- DPT: projection + fusion for tasks needing fine features
- FAPM: multi-scale pyramid for complex scenes
checkpoint.pth: latestbest_checkpoint.pth: best
- Realtime training loss and metrics
- Configurable print frequency
- Auto-save training config
- Fork this repo
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add some AmazingFeature') - Push the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License β see LICENSE for details.
- Meta AI's DINOv3
- The PyTorch community and related open-source projects
For questions or suggestions, please open an Issue.