Replication package for the paper "How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions". Read the full paper on arXiv.
Due to copyright considerations, raw chat traces are not redistributed; only episodes from repositories whose licenses explicitly permit redistribution (e.g., MIT, Apache-2.0) are released, while those from non-permissively licensed repositories are used only in aggregate analysis.
We provide an interactive viewer for the identified misalignment cases, as well as the annotation labels, available at Coding Agent Misalignment Atlas. The viewer includes only cases from permissively licensed repositories, with all personally identifiable information removed.
- session-formatting/: preprocessing step. Formats parsed sessions into LLM-ready text files for extraction.
- batch-runner/: reusable OpenAI Batch toolkit (build, submit, check, download, retry, postprocess).
- misalignment-extraction/: extracts candidate misalignment episodes from formatted sessions.
- misalignment-validation/: validates extracted episodes and filters unsupported cases.
- misalignment-annotation/: multi-axial annotation of validated episodes.
- data-aggregation/: aggregates intermediate outputs into downstream analysis tables; see data specs in this folder.
- distribution-analysis/: notebooks and utilities for paper figures/tables and analysis-ready outputs.
- misalignment-viewer/: static viewer for browsing the misalignment corpus.
- workspace/: expected data layout (not distributed here) for repository/session-level inputs.
- misalignments.json: aggregated list of all identified misalignment episodes with metadata and annotations, filtered to include only those from permissively licensed repositories.
- Session preprocessing: session-formatting/
- Extraction: misalignment-extraction/
- Validation: misalignment-validation/
- Annotation: misalignment-annotation/
- Aggregation: data-aggregation/
- Distribution analysis: distribution-analysis/
- Misalignment viewer: misalignment-viewer/
Typical structure under workspace/:
workspace/
└── {repo_id}/
├── session_parsed/ # Per-session parsed chat records
│ ├── session_001.json
│ └── ...
├── session_formatted/ # Per-session formatted chat records for LLM analysis
│ ├── session_001.txt
│ └── ...
└── meta.json # Repository metadata (e.g., name, language, session count)
@article{tang2026coding,
title={How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions},
author={Tang, Ningzhi and Chen, Chaoran and Xu, Gelei and Shi, Yiyu and Huang, Yu and McMillan, Collin and Dong, Tao and Li, Toby Jia-Jun},
journal={arXiv preprint arXiv:2605.29442},
year={2026}
}