Decomposing and measuring evaluation awareness in existing benchmarks and our proposed EvalAwareBench.
-
Updated
Jun 1, 2026 - Python
Decomposing and measuring evaluation awareness in existing benchmarks and our proposed EvalAwareBench.
Synthetic-document fine-tuning on Qwen2.5-7B: a controlled study of whether SDF installs sandbagging, finding a layered recognition/generation/behavior dissociation.
Repository for the arXiv 2026 prepring "Models That Know How Evaluations Are Designed Score Safer"
[WIP] Code and datasets for the thesis "Sensitivity Analysis of Evaluation Awareness in Large Language Models" · Licenciatura en Ciencia de Datos, Universidad de Buenos Aires (UBA).
Add a description, image, and links to the evaluation-awareness topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-awareness topic, visit your repo's landing page and select "manage topics."