ACM-ICL: Autonomy-Calibrated Multi-Agent In-Context Learning

Epistemic Robustness Under Adversarial Social Pressure

ACM-ICL is a four-stage inference pipeline that equips LLMs with structured mechanisms to resist epistemic herding — abandoning correct reasoning under pressure from adversarial or unreliable peers. Unlike existing multi-agent debate methods that assume well-intentioned collaborators, ACM-ICL treats every peer message as potentially adversarial, verifies claims against evidence, and weights peer contributions by demonstrated reliability.

Key result: ACM-ICL-Trained achieves 73.9% average accuracy across five benchmarks, outperforming the strongest baseline (MAD, 60.2%) by +13.7 percentage points with near-zero miscalibrated trust errors.

Model Architecture

Input: Question q, Context c, Peer Messages {m₁, ..., mₚ}
                        │
        ┌───────────────▼────────────────┐
        │  Stage 1: SOLVER               │
        │  Generate initial answer â     │
        │  with structured reasoning R   │
        └───────────────┬────────────────┘
                        │
        ┌───────────────▼────────────────┐
        │  Stage 2: SKEPTIC (DD-CoT)     │
        │  Parse peer claims             │
        │  Generate counter-arguments    │
        │  Verify against evidence       │
        └───────────────┬────────────────┘
                        │
        ┌───────────────▼────────────────┐
        │  Stage 3: VERIFIER             │
        │  Assign grounded verdicts      │
        │  {support, refute, uncertain}  │
        │  Multi-level matching          │
        └───────────────┬────────────────┘
                        │
        ┌───────────────▼────────────────┐
        │  Stage 4: CALIBRATED JUDGE     │
        │  Per-peer reliability (EMA)    │
        │  Temperature-scaled softmax    │
        │  Safety override               │
        └───────────────┬────────────────┘
                        │
                        ▼
              Output: Answer a*

Core Modules

Module	File	Description
ACMPolicy	`acm_icl/acm_policy.py`	Multi-role LLM wrapper with solver/skeptic/verifier/judge roles via system prompts and optional LoRA adapters, sharing all base weights
DD-CoT	`acm_icl/dd_cot.py`	Discriminative Decomposed Chain-of-Thought — structured 7-field JSON schema that architecturally separates independent judgment from social influence
CalibratedJudge	`acm_icl/judge.py`	Per-peer reliability scoring with EMA updates, temperature-scaled softmax aggregation, and safety override mechanism
TrajectoryBuilder	`acm_icl/trajectory_builder.py`	Builds contrastive trajectory pairs (autonomous vs. herding) with 5 peer-pressure protocols for training

DD-CoT Schema

Each ACM-ICL response follows this structured JSON format. Field ordering enforces independent reasoning before social consideration:

{
  "peer_claim_parse": "Semantic interpretation of each peer's claims",
  "self_answer": "Model's answer BEFORE any peer consideration",
  "counter_argument": "Explicit argument against peer consensus",
  "verification_plan": "Steps to verify peer claims",
  "verified_evidence": "Evidence gathered for/against claims",
  "final_decision": "Trust-weighted final answer",
  "peer_reliability_update": {"peer_0": 0.3, "peer_1": 0.8}
}

Training Pipeline

Stage 1: SFT                    Stage 2: DPO                  Stage 3: Calibration
┌──────────────────┐     ┌───────────────────────┐     ┌──────────────────────┐
│ QLoRA (r=64)     │     │ Contrastive DPO       │     │ Optimize τ* via      │
│ on autonomous    │ ──> │ chosen=autonomous     │ ──> │ scipy.optimize on    │
│ DD-CoT traces    │     │ rejected=herding      │     │ held-out calibration │
│                  │     │ β=0.1, lr=0.1×SFT    │     │ data                 │
└──────────────────┘     └───────────────────────┘     └──────────────────────┘

Project Structure

acm-icl-release/
├── acm_icl/                     # Core package
│   ├── acm_policy.py            # ACMPolicy: multi-role LLM wrapper
│   ├── dd_cot.py                # DD-CoT: structured reasoning schema
│   ├── judge.py                 # CalibratedJudge: trust scoring + safety
│   ├── trajectory_builder.py    # Contrastive trajectory pair generation
│   ├── cli.py                   # CLI entry point (train/evaluate/prepare_data)
│   ├── config.py                # Configuration dataclasses
│   ├── types.py                 # Shared types and enums
│   ├── training/
│   │   ├── trainer.py           # SFT + DPO + calibration pipeline
│   │   └── data_prep.py         # Trajectory → SFT/DPO sample conversion
│   ├── evaluation/
│   │   ├── runner.py            # EvalRunner: metrics, social metrics, safety
│   │   └── tables.py            # LaTeX table generation
│   ├── datasets/
│   │   ├── base.py              # BenchmarkAdapter ABC
│   │   ├── kairos.py            # KAIROS (TruthfulQA + peer pressure)
│   │   ├── benchform.py         # BenchForm (MMLU + conformity protocols)
│   │   ├── agentharm.py         # AgentHarm (safety + adversarial refusal)
│   │   ├── gsm8k.py             # GSM8K (math reasoning + peer pressure)
│   │   └── arc.py               # ARC-Challenge (science reasoning + pressure)
│   ├── serving/
│   │   └── vllm_server.py       # vLLM multi-LoRA batched inference
│   └── utils/
│       ├── logging.py           # Logging setup
│       └── seed.py              # Seed management for reproducibility
├── configs/
│   ├── sft_qwen.yaml            # SFT config for Qwen2.5-7B
│   ├── sft_llama.yaml           # SFT config for Llama-3.1-8B
│   ├── sft_mistral.yaml         # SFT config for Mistral-7B
│   ├── dpo_qwen.yaml            # DPO config
│   ├── eval_full.yaml           # Evaluation config
│   └── deepspeed_zero2.json     # DeepSpeed ZeRO-2 config
├── scripts/
│   ├── train.py                 # Training entry point
│   ├── evaluate.py              # Evaluation entry point
│   ├── prepare_data.py          # Data preparation
│   └── generate_tables.py       # Publication table generation
├── tests/
│   ├── test_datasets.py         # Dataset adapter tests
│   ├── test_dd_cot.py           # DD-CoT module tests
│   ├── test_evaluation.py       # Evaluation runner tests
│   ├── test_judge.py            # Judge module tests
│   └── test_trajectory_builder.py # Trajectory builder tests
├── data/
│   ├── trajectories_train.jsonl # Training trajectory pairs
│   ├── trajectories_eval.jsonl  # Evaluation trajectory pairs
│   └── calibration_holdout.jsonl # Judge calibration data
├── paper/
│   ├── acm_icl_colm2026.tex     # Full paper (COLM 2026 format)
│   ├── acm_icl_references.bib   # Bibliography
│   ├── math_commands.tex        # LaTeX macros
│   ├── colm2026_conference.sty  # Conference style file
│   └── colm2026_conference.bst  # Bibliography style
├── pyproject.toml               # Package configuration
├── environment.yml              # Conda environment spec
└── README.md                    # This file

Quick Start

Installation

# Option 1: Conda (recommended)
conda env create -f environment.yml
conda activate acm-icl

# Option 2: pip
pip install -e ".[dev,agentharm]"

Run Tests

python -m pytest tests/ -v

Prepare Training Data

Generate contrastive trajectory pairs from benchmark datasets:

python scripts/prepare_data.py --benchmark all --output-dir data --num-perturbations 5

Train

Stage 1 — SFT (supervised fine-tuning on autonomous trajectories):

python scripts/train.py --config configs/sft_qwen.yaml --stage sft

Stage 2 — DPO (contrastive preference optimization):

python scripts/train.py --config configs/dpo_qwen.yaml --stage dpo

Stage 3 — Judge Calibration (optimize temperature τ):

python scripts/train.py --config configs/sft_qwen.yaml --stage calibration

Full pipeline (all stages):

python scripts/train.py --config configs/sft_qwen.yaml --stage all

Evaluate

# Full evaluation on all benchmarks
python scripts/evaluate.py --config configs/eval_full.yaml

# Single benchmark with sample limit
python scripts/evaluate.py --config configs/eval_full.yaml --benchmark kairos --max-samples 50

# Single method
python scripts/evaluate.py --config configs/eval_full.yaml --method acm_icl

Generate Tables

python scripts/generate_tables.py --results-dir outputs/eval --output-dir outputs/tables

CLI Interface

# All commands available via CLI
acm-icl train --config configs/sft_qwen.yaml --stage all
acm-icl evaluate --config configs/eval_full.yaml
acm-icl prepare-data --benchmark all
acm-icl tables --results-dir outputs/eval

Benchmarks

Benchmark	Source Dataset	N	Evaluation Focus
KAIROS	TruthfulQA	500	Peer-pressure robustness on counterintuitive truths
BenchForm	MMLU	500	Conformity under 5 social influence protocols
AgentHarm	AgentHarm	176	Safety / adversarial refusal in agentic settings
GSM8K	openai/gsm8k	500	Mathematical reasoning under peer pressure
ARC	ARC-Challenge	500	Commonsense science reasoning + pressure

Each benchmark applies five peer-pressure protocols:

Majority Pressure — ~70%+ peers provide the same incorrect answer (confidence 0.75–0.95)
Confident Minority — ~1/3 peers give incorrect answer with very high confidence (0.90–0.99)
Sequential Reveal — Incorrect peers appear first in sequence
Simultaneous Vote — ~50/50 random split between correct and incorrect peers
Adversarial Debate — One extremely confident adversary (0.97) vs. several low-confidence correct peers

Supported Backbones

Model	Parameters	HuggingFace ID
Qwen2.5-7B-Instruct	7.6B	`Qwen/Qwen2.5-7B-Instruct`
Llama-3.1-8B-Instruct	8.0B	`meta-llama/Llama-3.1-8B-Instruct`
Mistral-7B-Instruct-v0.3	7.2B	`mistralai/Mistral-7B-Instruct-v0.3`

All models are loaded with 4-bit QLoRA quantization (NF4). To change backbone, update the config YAML or pass an override:

python scripts/train.py --config configs/sft_qwen.yaml \
    --overrides "model.backbone=meta-llama/Llama-3.1-8B-Instruct"

Hardware Requirements

Task	Minimum	Recommended
Inference (quantized)	1× GPU, 24 GB VRAM	1× GPU, 48+ GB VRAM
QLoRA Training (SFT/DPO)	1× GPU, 24 GB VRAM	4× GPU, 48+ GB VRAM
Full Evaluation (125 experiments)	1× GPU, 48 GB VRAM	8× GPU, 96 GB VRAM

Tested on: 8× NVIDIA RTX PRO 6000 Blackwell (96 GB each).

Citation

@inproceedings{acmicl2026,
  title={ACM-ICL: Autonomy-Calibrated Multi-Agent In-Context Learning for Epistemic Robustness Under Social Pressure},
  author={Anonymous},
  booktitle={Conference on Language Modeling (COLM)},
  year={2026}
}

License

Apache 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACM-ICL: Autonomy-Calibrated Multi-Agent In-Context Learning

Model Architecture

Core Modules

DD-CoT Schema

Training Pipeline

Project Structure

Quick Start

Installation

Run Tests

Prepare Training Data

Train

Evaluate

Generate Tables

CLI Interface

Benchmarks

Supported Backbones

Hardware Requirements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
acm_icl		acm_icl
configs		configs
data		data
paper		paper
scripts		scripts
serve		serve
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

ACM-ICL: Autonomy-Calibrated Multi-Agent In-Context Learning

Model Architecture

Core Modules

DD-CoT Schema

Training Pipeline

Project Structure

Quick Start

Installation

Run Tests

Prepare Training Data

Train

Evaluate

Generate Tables

CLI Interface

Benchmarks

Supported Backbones

Hardware Requirements

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages