Skip to content

amirkasaei/eval-the-evals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation

πŸ“– Overview

This repository contains the code and resources for our paper:

Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation
Seyed Amir Kasaei, Ali Aghayari, Arash Marioriyad, Niki Sepasian, MohammadAmin Fazli,
Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban

Paper (arXiv) | Project Page


πŸš€ Introduction

Text-to-image generation has made impressive progress, but evaluating compositional alignment remains a fundamental challenge.
This project systematically analyzes widely used evaluation metrics to understand their reliability, limitations, and correspondence to human judgment.

We study 12 metrics across 8 compositional categories using T2I-CompBench++, highlighting:

  • No single metric is universally reliable.
  • Embedding-based metrics (e.g., ImageReward, HPS) and VQA-based metrics (e.g., DA Score, VQA Score) each have strengths and weaknesses.
  • Image-only metrics (e.g., CLIP-IQA, Aesthetic Score) contribute little to compositional evaluation.

πŸ“Š Results

  • Correlation analysis (Pearson, Kendall, Spearman) across compositional categories.
  • Regression analysis to examine combined contributions of metrics.
  • Distributional study highlighting biases and limitations.

Key finding: A combination of complementary metrics is essential for trustworthy evaluation.


πŸ“‚ Repository Structure

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors