ToxiShield — Replication Package

Replication package for "ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering" (FSE 2026).

ToxiShield is a Chrome extension for real-time toxicity detection and detoxification in GitHub pull request reviews. It is built around three ML modules, evaluated end-to-end through a user study with 10 professional developers.

Quick Navigation

Paper Section	Folder	What you can reproduce
§2 – Toxicity Filter	`[toxicity-filter/](toxicity-filter/)`	Train BERT binary classifier; verify 98% accuracy / F1=0.97
§3 – Communication Coach	`[communication-coach/](communication-coach/)`	Verify Claude 3.5 Sonnet Macro F1=0.42, MCC=0.39
§4 – The Reframer	`[reframer/](reframer/)`	Fine-tune Llama 3.2 3B; evaluate J-Score=84%
§5 – Browser Extension	`[browser-extension/](browser-extension/)`	Install and run the full system
§5.1 – User Study	`[survey/](survey/)`	Inspect TAM survey responses
Validation	`[manual-validation/](manual-validation/)`	Reproduce Cohen's κ for both annotation tasks

Architecture Overview

ToxiShield operates as a two-stage pipeline triggered when a developer types a GitHub PR comment:

Developer types PR comment
        │
        ▼
┌──────────────────────┐
│   Module 1           │  BERT-base-uncased (INT8 ONNX, runs in-browser)
│   Toxicity Filter    │  Binary: toxic / non-toxic
└──────┬───────────────┘
       │ if toxic
       ▼
┌──────────────────────┐
│   Module 2           │  Claude 3.5 Sonnet (LLM, zero-shot)
│   Communication      │  12-class subcategory + explanation
│   Coach              │
└──────┬───────────────┘
       │
       ▼
┌──────────────────────┐
│   Module 3           │  Llama 3.2 3B (LoRA fine-tuned, served via Ollama)
│   The Reframer       │  Generates detoxified alternative + rationale
└──────────────────────┘
       │
       ▼
Inline suggestion shown to developer (accept / discard / rate)

Key Results at a Glance

Module	Model	Primary Metric	Result
Toxicity Filter	BERT-base-uncased (fine-tuned)	F1 (toxic class)	0.97
Communication Coach	Claude 3.5 Sonnet	Macro F1 / MCC	0.42 / 0.39
The Reframer	Llama 3.2 3B (LoRA)	J-Score	84.00%

Module 1: Toxicity Filter (§2)

What it does: Binary classification of PR comments (toxic vs non-toxic). Fine-tunes bert-base-uncased on a curated dataset of 38,761 labelled PR comments from 15M GitHub PRs. Best model exported to ONNX INT8 for in-browser inference.

Dataset: 10,120 toxic samples (stratified sampling across ToxiCR probability bins) + 28,641 non-toxic samples. Available at toxishield/38k-dataset-labelled.

To verify results (no training needed):

cd toxicity-filter
pip install -r requirements.txt
cat results/kfold-metrics/cross_validation_results.csv

To re-run training (GPU required, ~10 hours):

jupyter notebook notebooks/train-classifier.ipynb
# Navigate to Section 2 (10-fold CV block) — do not Run All

Key files:

Path	Description
`data/38k-detection-dataset-full.csv`	Full labelled dataset (38,761 samples)
`notebooks/train-classifier.ipynb`	Training + 10-fold CV + ONNX INT8 export
`results/kfold-metrics/cross_validation_results.csv`	Per-fold TP/TN/FP/FN, accuracy, F1
`results/kfold-misclassifications/misclassification_{1-10}.csv`	All 701 misclassified instances
`comparison/openai-detection-inference/`	GPT-4o zero-shot baseline (Table 2)
`comparison/openai-detection-inference/compute_table2_metrics.py`	Script to compute Table 2 metrics from JSONL

Module 2: Communication Coach (§3)

What it does: Classifies toxic PR comments into 12 subcategories using prompt-engineered LLMs. No fine-tuning — uses in-context learning with iteratively refined prompts across 14 evolutionary stages.

Prompt stages and paper mapping:

Iterations	Paper Stage	Key change
iter_1	Stage 1 (zero-shot baseline)	Class names only, no guidance
iter_2–iter_5	Stages 2–5	Added behavior-based definitions, sarcasm handling, lexical cues, rare-category examples
iter_6–iter_8	Stage 2.1–2.3	Sub-refinements of Stage 2
iter_9–iter_11	Stage 3.1–3.3	Sub-refinements of Stage 3
iter_12–iter_14	Stage 4.1–4.3	Sub-refinements of Stage 4

Best prompt for cross-model comparison: iterations/iter_4/prompt.py (Table 4 results)

To verify results (no API key needed):

cd communication-coach
pip install -r requirements.txt
python scripts/evaluate.py
# Prints: Macro F1=0.42, Macro MCC=0.39 — matches Table 4

To re-run inference (API key required):

cp .env.example .env    # add OPENAI_API_KEY or ANTHROPIC_API_KEY

# 1. Set model and iteration in scripts/config.py (default: gpt-4o, iter_4)
# 2. Run inference:
python scripts/openai-inference.py
# Results saved to iterations/iter_4/results.csv

# To evaluate results:
python scripts/evaluate.py
# Writes mcc_by_prompt.json and per_label_confusion_matrices.pdf

Note: Table 4's best result (Claude 3.5 Sonnet) requires the Anthropic API client — openai-inference.py runs GPT models only. Stored Claude results are at iterations/iter_4/results-claude-3.5-sonnet.csv and can be evaluated directly without re-running inference.

Key files:

Path	Description
`scripts/config.py`	Central config: model, iteration, dataset path, output path, labels
`scripts/openai-inference.py`	Runs inference for the configured model/iteration
`scripts/evaluate.py`	Computes Macro F1, MCC, exact match; writes confusion matrix PDF
`notebooks/evaluate.ipynb`	Notebook-based evaluation (set `FILE_PATH` to any results CSV)
`data/multiclass-dataset-full.csv`	1,200-sample labelled multiclass dataset
`iterations/iter_4/results-claude-3.5-sonnet.csv`	Claude 3.5 Sonnet results (Table 4, best)
`iterations/iter_4/results.csv`	GPT-4o results (Table 4)
`iterations/iter_4/mcc_by_prompt.json`	MCC scores across iterations

Module 3: The Reframer (§4)

What it does: Text style transfer — rewrites toxic PR comments into professional alternatives while preserving technical meaning. Uses teacher-student knowledge distillation: GPT-4o generates training pairs, Llama 3.2 3B learns from them via LoRA.

Parallel dataset: 10,117 (toxic, detoxified) pairs generated by gpt-4o-2024-05-13. Available at toxishield/20k-dataset-parallel.

Evaluation metrics:

Metric	What it measures	Notebook
DETOX	% reduction in toxicity score (via ToxiCR)	`metric-incivility-decrease.ipynb`
FL	Fluency via CoLA acceptability classifier	`metric-style-acc-flu-sim.ipynb`
PRESERVE	Semantic similarity via sentence-transformers	`metric-style-acc-flu-sim.ipynb`
J-Score	Harmonic mean of DETOX × FL × PRESERVE	`metric-style-acc-flu-sim.ipynb`

To evaluate stored results (no GPU needed):

cd reframer/detoxifier-fine-tuning
pip install -r ../requirements.txt
jupyter notebook notebooks/metric-style-acc-flu-sim.ipynb
# Change INPUT_FILE to evaluate different models

To re-run fine-tuning (GPU ≥16 GB VRAM required, ~6 hours):

jupyter notebook notebooks/fine-tune-huggingface.ipynb
# Base model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit
# Uses num_train_epochs=10, LoRA rank=16, lr=2e-4

Result files in detoxifier-fine-tuning/data/toxicr-outputs/:

File	Table 6 model
`toxicr-teacher-gpt-4o-05-13-llama-3.2-3b-input-output-cleaned.xlsx`	Llama 3.2 3B (best, J=84%)
`toxicr-teacher-gpt-4o-05-13-llama-3.1-8b-input-output-cleaned.xlsx`	Llama 3.1 8B
`toxicr-phi-3.5-10k-test-teacher-gpt-4o-05-13-input-output.xlsx`	Phi 3.5
`toxicr-gemma-2b-10k-test-teacher-gpt-4o-05-13-input-output.xlsx`	Gemma 2B
`toxicr-ft-gpt-4o-mini-toxishield-test-inference-10k-input-output.csv`	GPT-4o mini FT
`toxicr-ft-gpt-35-baseline-test-inference-10k-input-output.csv`	GPT-3.5 FT (baseline)

Note: Qwen 2.5 Instruct 7B output file is not included in this package.

Sub-directory overview:

reframer/
├── parallel-dataset/         # Teacher model data generation scripts and raw outputs
├── openai-fine-tuning/       # GPT-4o-mini and GPT-3.5 fine-tuning data (comparison baselines)
├── detoxifier-fine-tuning/   # Llama 3.2 3B LoRA fine-tuning (primary model)
│   ├── notebooks/            # Training + all evaluation notebooks
│   └── data/toxicr-outputs/  # ToxiCR-scored input/output pairs for all 6 models
└── baseline-comparison/      # Prior-work baseline dataset and comparison results

Module 4: Browser Extension + Backend (§5)

What it does: Chrome extension (Manifest V3, React/Vite) that integrates all three modules into the developer workflow. Runs the BERT classifier locally via ONNX Runtime Web; calls the backend API for detoxification.

# 1. Start the backend
cd browser-extension/detoxifier-backend
cp .env.example .env   # fill in OPENAI_API_KEY, DB credentials, OLLAMA_BASE_URL
npm install
npm start              # starts on http://localhost:3000

# 2. Load the extension (pre-built artifact)
# Chrome → chrome://extensions → Developer mode → Load unpacked
# → select browser-extension/toxishield/

Extension source layout:

Path	Description
`browser-extension/src/`	React side-panel UI, inference logic, TypeScript types
`browser-extension/public/content.js`	Injected content script: intercepts GitHub PR comment forms
`browser-extension/public/service_worker.js`	Background worker: routes messages between content script and panel
`browser-extension/public/static/vocab.json`	BERT WordPiece vocabulary for in-browser tokenisation
`browser-extension/toxishield/`	Pre-built unpacked extension (load this in Chrome)
`browser-extension/detoxifier-backend/`	Node.js/Express API: detoxification inference + usage logging

ONNX model: classifier_int8.onnx is not included (binary, ~80 MB). Generate it by running the ONNX export section of toxicity-filter/notebooks/train-classifier.ipynb, then place the file at browser-extension/public/bert-base/classifier_int8.onnx.

See [browser-extension/README.md](browser-extension/README.md) for full setup and development build instructions.

User Study (§5.1)

10 professional software developers (US and Bangladesh) used ToxiShield on real GitHub repositories for two weeks. IRB-approved. Survey materials and anonymised responses are in [survey/](survey/).

File	Description
`survey.xlsx`	Anonymised responses (10 participants, 9 Likert items + open-ended)
`ToxiShield_Developer_Survey_Guide.pdf`	Study protocol and task instructions given to participants
`ToxiShield _ Post-Study-Feedback-Form.pdf`	Post-study TAM questionnaire
`test-repository.txt`	GitHub repository used during the two-week study

Inter-Annotator Agreement (§3.3, §4.4)

cd manual-validation
pip install -r requirements.txt
jupyter notebook notebooks/kappa.ipynb

Task	κ	Interpretation
Multiclass subcategory (100 samples)	0.67	Substantial
Detox quality — minimal change	0.82	Almost perfect
Detox quality — context preservation	0.72	Substantial
Detox quality — communication style	0.77	Substantial

Environment Setup

Python 3.10+ required. Each module has its own requirements.txt.

pip install -r toxicity-filter/requirements.txt      # Module 1
pip install -r communication-coach/requirements.txt  # Module 2
pip install -r reframer/requirements.txt             # Module 3
pip install -r manual-validation/requirements.txt    # Validation

Browser extension and backend: Node.js 18+.

API keys (set in environment or in a module-level .env):

OPENAI_API_KEY=...       # Module 2 (GPT runs), Module 3 (parallel dataset generation)
ANTHROPIC_API_KEY=...    # Module 2 (Claude runs — best result in Table 4)
HF_TOKEN=...             # Optional: only needed to push models to HuggingFace Hub

Reproducibility Summary

Finding	How to verify (stored artifacts)	How to re-run
BERT F1=0.97 (Table 2)	`toxicity-filter/results/kfold-metrics/cross_validation_results.csv`	~10 GPU-hours, A100
Claude MCC=0.39 (Table 4)	`communication-coach/iterations/iter_4/results-claude-3.5-sonnet.csv` → run `evaluate.py`	Anthropic API
Llama J=84% (Table 6)	`reframer/detoxifier-fine-tuning/data/toxicr-outputs/` → run metric notebooks	~6 GPU-hours, A100

All LLM inference used temperature=0 for determinism. Exact API model versions are recorded in the response_full column of each results CSV.

Copyright Information

Authors: Md Awsaf Alam Anindya (awsafalam@gmail.com), Showvik Biswas (showvikdbz@gmail.com) Jaydeb Sarker jsarker@unomaha.edu and Amiangshu Bosu abosu@wayne.edu

This program is free software; you can redistribute it and/or modify it under the terms of the Apache-2.0 license as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Citation for our papers

If you use our work, please cite our paper:

FSE 2026 (Research Track): "ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering"

@article{anindya2026toxishield,
  title={ToxiShield: Enhancing Developer Collaboration through Real-Time Toxicity Filtering},
  author={Anindya, Md Awsaf Alam and Biswas, Showvik and Iqbal, Anindya and Sarker, Jaydeb and Bosu, Amiangshu},
  journal={Proceedings of the ACM on Software Engineering},
  volume={},
  number={FSE},
  pages={TBD},
  year={2026},
  publisher={ACM New York, NY, USA}
}

FSE 2026 (Poster Track): "Real-Time Toxicity Filtering for Open-Source Code Reviews"

@inproceedings{poster2026toxishield,
  title={Real-Time Toxicity Filtering for Open-Source Code Reviews},
  author={Anindya, Md Awsaf Alam and Biswas, Showvik and Iqbal, Anindya and Sarker, Jaydeb and Bosu, Amiangshu},
    booktitle={Proceedings of the 34th ACM International Conference on the Foundations of Software Engineering},
  pages={},
  year={2026},
  location = {Montreal, Canada},
 series = {FSE Companion '26},
  publisher={ACM New York, NY, USA}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToxiShield — Replication Package

Quick Navigation

Architecture Overview

Key Results at a Glance

Module 1: Toxicity Filter (§2)

Module 2: Communication Coach (§3)

Module 3: The Reframer (§4)

Module 4: Browser Extension + Backend (§5)

User Study (§5.1)

Inter-Annotator Agreement (§3.3, §4.4)

Environment Setup

Reproducibility Summary

Copyright Information

Citation for our papers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
browser-extension		browser-extension
communication-coach		communication-coach
reframer		reframer
survey		survey
toxicity-filter		toxicity-filter
.env.docker		.env.docker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ToxiShield — Replication Package

Quick Navigation

Architecture Overview

Key Results at a Glance

Module 1: Toxicity Filter (§2)

Module 2: Communication Coach (§3)

Module 3: The Reframer (§4)

Module 4: Browser Extension + Backend (§5)

User Study (§5.1)

Inter-Annotator Agreement (§3.3, §4.4)

Environment Setup

Reproducibility Summary

Copyright Information

Citation for our papers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages