[WACV 2026] CAAC: Confidence-Aware Attention Calibration to Reduce Hallucinations in Large Vision-Language Models

Project Description

This repository contains the implementation of Confidence-Aware Attention Calibration (CAAC) a training-free hallucination mitigation method for large vision-language models (LVLM). It provides a robust framework for researchers and developers to conduct experiments and analyze results in a structured and reproducible manner.

Confidence-Aware Attention Calibration (CAAC) framework addresses hallucination challenge by targeting two key biases: spatial perception bias, which distributes attention disproportionately across image tokens, and modality bias, which shifts focus from visual to textual inputs over time. CAAC employs a two-step approach: Visual-Token Calibration (VTC) to balance attention across visual tokens, and Adaptive Attention Re-Scaling (AAR) to reinforce visual grounding guided by the model's confidence. This confidence-driven adjustment ensures consistent visual alignment during generation.

CAAC Framework

Implementation Details

Our implementation leverages the Hugging Face versions of LLaVA-1.5 and InstructBLIP, built on Transformer version 4.47.

Dataset Requirements

To run the experiments, you need to download the following datasets:

AMBER Dataset: Please download the AMBER dataset from its original repository AMBER. Ensure it is extracted and accessible on your system.
MS COCO 2014 Validation Set: Required for the POPE and CHAIR benchmarks. Download the validation set from the official MS COCO website and prepare it for use in the experiments.

Environment Setup

To set up the required environment, follow these steps:

Install Conda: Ensure you have Miniconda or Anaconda installed on your system. If not, download and install it from the official website.
Create Conda Environment: Create a new Conda environment named CAAC with Python 3.10 by running:
```
conda create -n CAAC python=3.10
```
Activate Environment: Activate the environment with:
```
conda activate CAAC
```
Install Dependencies: Install the required dependencies listed in the requirements.txt file by running:
```
pip install -r requirements.txt
```

Configuration

Before running the experiments, you must configure the config.json file located in the configs folder. Update the following parameters with the appropriate paths:

cache_dir: The directory where model checkpoints will be stored.
amber_path: The path to the downloaded AMBER dataset.
chair_path: The path to the CHAIR dataset.
POPE_question_dir: The path to the POPE dataset questions.
POPE_image_folder: The path to the POPE dataset images.
log_dir: The directory where experiment results will be saved.

Ensure all paths are absolute or correctly relative to the execution directory to avoid runtime errors.

Running Experiments

To execute the experiments, use the provided shell scripts with the configured config.json file. Run the following commands from the root of the repository:

For CHAIR Experiments:

bash ./experiments/scripts/run_chair.sh ./configs/config.json

For AMBER Experiments:

bash ./experiments/scripts/run_amber.sh ./configs/config.json

For POPE Experiments:

bash ./experiments/scripts/run_pope.sh ./configs/config.json

Make sure the config.json file is properly set up before running these commands. The scripts will process the respective benchmarks and save the results to the directory specified in log_dir.

Logs Directory

The logs directory contains experimental results for the CAAC framework's AMBER, CHAIR, and POPE benchmarks. You can use the log files along with the evaluation scripts (./evals) to reproduce our CAAC results on the benchmarks.

Evals Directory

The evals directory contains scripts to evaluate model outputs on the AMBER, CHAIR, and POPE benchmarks. Below are the details and usage instructions for each script:

1. `pope.py`

This script evaluates the model's outputs on the POPE benchmark.

Usage:

python pope.py --gt_files /path/to/POPE/coco_pope_popular.json --gen_files /path/to/POPE_output_popular.json

--gt_files: Path to the ground truth JSON file for POPE.
--gen_files: Path to the generated outputs JSON file from your model.

2. `chair.py`

This script evaluates the model's outputs on the CHAIR benchmark.

Usage:

python chair.py --coco_path /path/to/CHAIR/annotations --cap_file /path/to/CHAIR_output.jsonl

--coco_path: Path to the COCO annotations directory.
--cap_file: Path to the generated captions file in JSONL format.

3. `inference.py`

This script evaluates the model's outputs on the AMBER benchmark. It is a slightly modified version of the original AMBER evaluation script.

Usage:

python inference.py --inference_data /path/to/your/inference/file --evaluation_type g --gen_response_tag response_512

--inference_data: Path to the inference/output file.
--evaluation_type: Type of evaluation (e.g., g for generative tasks).
--gen_response_tag: Tag for the generated response (e.g., response_512 for responses with up to 512 tokens).

Note: Update the paths in the commands to match your local file system. For additional options or details, refer to the script documentation or source code.

Additional Notes

Verify that all dataset paths are correctly specified in config.json to prevent issues during execution.
The repository structure includes folders such as configs/ for configuration files,󠁧 scripts/ for shell scripts, logs/ for experimental results, and evals/ for evaluation scripts. If your structure differs, adjust the paths in the commands accordingly.
For further assistance or to report issues, please refer to the repository’s documentation or contact the maintainers.

Happy benchmarking!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
baselines		baselines
configs		configs
evals		evals
experiments		experiments
figs		figs
logs		logs
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[WACV 2026] CAAC: Confidence-Aware Attention Calibration to Reduce Hallucinations in Large Vision-Language Models

Project Description

CAAC Framework

Implementation Details

Dataset Requirements

Environment Setup

Configuration

Running Experiments

Logs Directory

Evals Directory

1. `pope.py`

2. `chair.py`

3. `inference.py`

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[WACV 2026] CAAC: Confidence-Aware Attention Calibration to Reduce Hallucinations in Large Vision-Language Models

Project Description

CAAC Framework

Implementation Details

Dataset Requirements

Environment Setup

Configuration

Running Experiments

Logs Directory

Evals Directory

1. pope.py

2. chair.py

3. inference.py

Additional Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. `pope.py`

2. `chair.py`

3. `inference.py`

Packages