Skip to content

eVI-group-SCU/V-Zero

Repository files navigation

V-Zero

V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

[Paper]

Overview

V-Zero improves fine-grained visual reasoning without annotated answer labels. The student model samples on-policy reasoning trajectories from the full image, while a teacher model replays the same trajectories with paired positive and negative visual evidence views. By contrasting teacher support under the task-relevant crop and an irrelevant crop, V-Zero estimates how well each trajectory is grounded in visual evidence and uses this signal to gate dense token-level distillation. The resulting training objective keeps standard full-image inference unchanged while providing answer-label-free supervision for localized visual reasoning.

V-Zero Method Overview

Training

The V-Zero training implementation is included under verl/, with the public launcher in scripts/run_vzero_qwen35_vl_fsdp.sh. It configures the on-policy distillation recipe with teacher replay, positive/negative evidence crops, and evidence-gated token distillation.

Install the training package from the repository root:

uv pip install -e .

See scripts/README.md for data schema, environment variables, and launch examples.

TODO

  • Release training code
  • Release data preparation scripts
  • Release evaluation scripts
  • Release model checkpoints
  • Add detailed reproduction instructions

License

This project is released under the Apache License 2.0.

About

Answer-label-free on-policy distillation with contrastive evidence gating for fine-grained visual reasoning.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors