Skip to content

YuvMilo/MechanisticAccountofSinks

Repository files navigation

Code for reproducing the experiments in the paper "A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation".

Setup

conda create -n sinks python=3.11 -y
conda activate sinks
pip install -r requirements.txt

Reproducing the Figures and Tables

Figure 1 — Source-Agnostic Shift Histogram (truncated) + Appendix Full Histogram

python experiments_statistical.py --mode bias-term --output-dir results

Outputs:

  • results/bias_term_statistical/bq_k_aggregate_plot_truncated.pngFig 1
  • results/bias_term_statistical/bq_k_aggregate_plot.pngFig 6 (appendix, full histogram)

Figure 2 — EPE-Bias Projection Alignment

python experiments_single_input.py --mode epe-bias-proj --output-dir results

Output:

  • results/epe_bias_proj/epe_alignment.pngFig 2

Figure 3 — EPE Captures the Net Positional Contribution

python experiments_statistical.py --mode epe-validation --output-dir results

Outputs:

  • results/epe_validation_statistical/epe_validation_plot.pngFig 3
  • results/epe_validation_statistical/epe_validation_precentiles (numerical values for experiments)

Figure 4 — Coordinate-Level Alignment Histogram (truncated) + Appendix Full Histogram

python experiments_statistical.py --mode coord-alignment --output-dir results

Outputs:

  • results/coord_alignment_statistical/coord_alignment_histogram_truncated.pngFig 4
  • results/coord_alignment_statistical/coord_alignment_histogram.pngFig 8 (appendix, full histogram)

Figure 5 — Intervention Attention Maps

python intervention_analysis.py --mode sentence --output-dir results

Outputs:

  • results/sentence_analysis/layer_04_avg.png through layer_11_avg.pngFig 5 (layers 4--11)

Table 1 — BOS Attention Statistics

python intervention_analysis.py --mode dataset --output-dir results

Outputs:

  • results/dataset_analysis/bos_attention_summary_mid_layers.txtTable 1
  • results/dataset_analysis/bos_attention_summary_mid_layers.csv

Figure 7 (appendix) — Massive Activations in EPE_1

python experiments_single_input.py --mode massive-activations --output-dir results

Output:

  • results/massive_activations/massive_activations_in_ppe.pngFig 7

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages