Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage
This directory contains demo materials for our recent work on cognitive collusion attacks against LLM-based agents. We demonstrate how coordinated agents can manipulate victim beliefs using only truthful evidence fragments through strategic narrative construction.
.
├── README.md
├── dataset/
│ └── CoPHEME/ # Complete CoPHEME dataset
├── Montage/
│ └── attack_plans/ # Pre-generated attack sequences using GPT-4.1-mini
│ ├── charlie_hebdo/
│ ├── sydney_siege/
│ ├── ferguson/
│ ├── ottawa_shooting/
│ ├── germanwings/
│ └── putin_missing/
| └──src/ Demo code for our framework
├── requirements.txt # Python dependencies
└── environment.yml # Conda environment file
Option 1: Using pip
pip install -r requirements.txtOption 2: Using conda
conda env create -f environment.yml
conda activate colusive_agent./run_demo.shExample output:
[Event: Charlie Hebdo]
Target Hypothesis: Ahmed Merabet was the first victim
Attack Success Rate: 81.7%
Average Confidence: 0.83
High-Confidence ASR: 67.4%
The CoPHEME dataset extends PHEME for cognitive collusion research:
- 6 rumor events: Charlie Hebdo, Sydney Siege, Ferguson, Ottawa Shooting, Germanwings, Putin Missing
- evidence fragments: Verified truthful tweets satisfying Local Truth (LT=1)
- target fabrications: High-impact false narratives for attack simulation
See CoPHEME/ for complete statistics.
To facilitate reproducibility without requiring extensive API calls, we provide pre-generated attack sequences produced by our Generative Montage framework using GPT-4.1-mini:
- Writer-Editor-Director optimized narratives
- Validated montage sequences (passed Director acceptance threshold τ=7.0)
- Ready-to-use for easy victim evaluation across different LLM families
Set your API keys as environment variables:
# OpenAI
export OPENAI_API_KEY="your-key-here"
# Anthropic
export ANTHROPIC_API_KEY="your-key-here"
# For open-weights models (if using hosted APIs)
export HUGGINGFACE_API_KEY="your-key-here"Full code release: The complete implementation and dataset will be released upon paper acceptance.