Source code for MechaRule: neuron-anchored rule extraction for large language models via contrastive hierarchical ablation.
python eap arithmetic reproducibility jailbreaking natural-language-inference rule-extraction hans explainable-ai large-language-models llm model-editing mechanistic-interpretability kdd2026 neuron-ablation contrastive-ablation hierarchical-ablation causal-interpretability mecharule
-
Updated
Jun 6, 2026 - Python