[WACV 2026] CAAC: Confidence-Aware Attention Calibration to Reduce Hallucinations in Large Vision-Language Models
This repository contains the implementation of Confidence-Aware Attention Calibration (CAAC) a training-free hallucination mitigation method for large vision-language models (LVLM). It provides a robust framework for researchers and developers to conduct experiments and analyze results in a structured and reproducible manner.
Confidence-Aware Attention Calibration (CAAC) framework addresses hallucination challenge by targeting two key biases: spatial perception bias, which distributes attention disproportionately across image tokens, and modality bias, which shifts focus from visual to textual inputs over time. CAAC employs a two-step approach: Visual-Token Calibration (VTC) to balance attention across visual tokens, and Adaptive Attention Re-Scaling (AAR) to reinforce visual grounding guided by the model's confidence. This confidence-driven adjustment ensures consistent visual alignment during generation.
Our implementation leverages the Hugging Face versions of LLaVA-1.5 and InstructBLIP, built on Transformer version 4.47.
To run the experiments, you need to download the following datasets:
- AMBER Dataset: Please download the AMBER dataset from its original repository AMBER. Ensure it is extracted and accessible on your system.
- MS COCO 2014 Validation Set: Required for the POPE and CHAIR benchmarks. Download the validation set from the official MS COCO website and prepare it for use in the experiments.
To set up the required environment, follow these steps:
- Install Conda: Ensure you have Miniconda or Anaconda installed on your system. If not, download and install it from the official website.
- Create Conda Environment: Create a new Conda environment named
CAACwith Python 3.10 by running:conda create -n CAAC python=3.10
- Activate Environment: Activate the environment with:
conda activate CAAC
- Install Dependencies: Install the required dependencies listed in the
requirements.txtfile by running:pip install -r requirements.txt
Before running the experiments, you must configure the config.json file located in the configs folder. Update the following parameters with the appropriate paths:
cache_dir: The directory where model checkpoints will be stored.amber_path: The path to the downloaded AMBER dataset.chair_path: The path to the CHAIR dataset.POPE_question_dir: The path to the POPE dataset questions.POPE_image_folder: The path to the POPE dataset images.log_dir: The directory where experiment results will be saved.
Ensure all paths are absolute or correctly relative to the execution directory to avoid runtime errors.
To execute the experiments, use the provided shell scripts with the configured config.json file. Run the following commands from the root of the repository:
- For CHAIR Experiments:
bash ./experiments/scripts/run_chair.sh ./configs/config.json
- For AMBER Experiments:
bash ./experiments/scripts/run_amber.sh ./configs/config.json
- For POPE Experiments:
bash ./experiments/scripts/run_pope.sh ./configs/config.json
Make sure the config.json file is properly set up before running these commands. The scripts will process the respective benchmarks and save the results to the directory specified in log_dir.
The logs directory contains experimental results for the CAAC framework's AMBER, CHAIR, and POPE benchmarks. You can use the log files along with the evaluation scripts (./evals) to reproduce our CAAC results on the benchmarks.
The evals directory contains scripts to evaluate model outputs on the AMBER, CHAIR, and POPE benchmarks. Below are the details and usage instructions for each script:
This script evaluates the model's outputs on the POPE benchmark.
Usage:
python pope.py --gt_files /path/to/POPE/coco_pope_popular.json --gen_files /path/to/POPE_output_popular.json--gt_files: Path to the ground truth JSON file for POPE.--gen_files: Path to the generated outputs JSON file from your model.
This script evaluates the model's outputs on the CHAIR benchmark.
Usage:
python chair.py --coco_path /path/to/CHAIR/annotations --cap_file /path/to/CHAIR_output.jsonl--coco_path: Path to the COCO annotations directory.--cap_file: Path to the generated captions file in JSONL format.
This script evaluates the model's outputs on the AMBER benchmark. It is a slightly modified version of the original AMBER evaluation script.
Usage:
python inference.py --inference_data /path/to/your/inference/file --evaluation_type g --gen_response_tag response_512--inference_data: Path to the inference/output file.--evaluation_type: Type of evaluation (e.g.,gfor generative tasks).--gen_response_tag: Tag for the generated response (e.g.,response_512for responses with up to 512 tokens).
Note: Update the paths in the commands to match your local file system. For additional options or details, refer to the script documentation or source code.
- Verify that all dataset paths are correctly specified in
config.jsonto prevent issues during execution. - The repository structure includes folders such as
configs/for configuration files,scripts/for shell scripts,logs/for experimental results, andevals/for evaluation scripts. If your structure differs, adjust the paths in the commands accordingly. - For further assistance or to report issues, please refer to the repository’s documentation or contact the maintainers.
Happy benchmarking!
