This Python package implements the Kappa-IoU measure of Inter-annotator reliability from Rogers (2026):
Rogers, Trevor F. (2026). Kappa-IoU: Inter-Rater Reliability for Spatial Annotation
Large-scale annotation projects for which automated annotation solutions are insufficient require a metric that measures the quality of annotation between annotators. Intersection-over-Union, or IoU, is traditionally used for spatial data but requires a ground truth. Cohen's Kappa is a psychometric measure that measures the amount of agreement between raters, but is traditionally used for categorical data.
This repository contains a Python package implementing the Kappa-IoU paper, which indexes IoU with Cohen's Kappa, and handles differences in object cardinality between raters by treating it as a linear assignment problem using Kuhn-Munkres assignment. Also included is an adjusted version of Kappa-IoU that uses a version of the BLEU measure for machine translation which instead scores N-objects for multi-object annotation. Along with the Python package implementing this measure, this repository also contains Section4.py, code that recreates the examples from that section of the paper.
To install this package from inside the downloaded folder from Github:
python -m pip install -e .
To use, run the analysis engine straight from your terminal shell across your dataset directories using the console entrypoint:
kappaiou-run --baseline /data/author_baseline.csv --dir /data/annotator_tranches/ --tau 0.70
The author baseline and annotator tranche CSV files are expected to have the filenames in the first column, label in the second column, and then xywh-format annotations in the following four columns.
MIT License