This repository introduces H-VLI (Hate via Vision-Language Interplay), a benchmark dataset specifically curated to decipher "semantic intent shifts" in multimodal hate speech, where toxicity often emerges from the subtle interplay between benign modalities. Alongside the dataset, we provide the official implementation of ARCADE (Asymmetric Reasoning via Courtroom Agent DEbate), a hierarchical courtroom framework designed to scrutinize these complex cross-modal interactions and uncover latent hateful intents.
- [2026-04-06]: Our paper has been accepted by ACL 2026 Findings! 🎉
To run the experiments, please download the images for the respective datasets and place them in the following directory structure:
- FHM Images: Download from Kaggle.
- MMHS150K Images: Download from the official site.
- H-VLI Images: Download from Google Drive.
The standard train/test splits for the H-VLI benchmark are provided in the data/ directory (train_set.json and test_set.json).
Organize the downloaded images as follows:
imgs/
├── FHM/ # .jpg files from Facebook Hateful Meme
├── MMHS150K/ # .jpg files from MMHS150K
└── H-VLI_images/ # .jpg files from H-VLI
Motivated by the evolution of hate speech from plain text to complex multimodal expressions, traditional binary detection often fails to identify implicit attacks. To address this, the H-VLI (Hate via Vision-Language Interplay) mechanism focuses on capturing "semantic intent shifts." Within this framework, uni-modal annotations are interpreted in isolation, where an individual text or image might appear entirely benign or overtly toxic. However, through intricate inter-modality interaction, these modalities combine to create a semantic shift—either constructing implicit hate from benign unimodal cues or neutralizing apparent toxicity through semantic inversion.
Figure 1: Typical examples of "Implicit Hate" in H-VLI, where toxicity emerges solely from the interplay between benign-looking text and images.
To strictly correspond with the H-VLI mechanism, the benchmark dataset is constructed using a hybrid pipeline of consensus filtering, generative injection, and human-in-the-loop annotation. This guarantees a high density of challenging multimodal samples where the true intent fundamentally hinges on the intricate interplay of modalities rather than relying on isolated visual or textual slurs.
We introduce the H-VLI (Hate via Vision-Language Interplay) benchmark, specifically curated to challenge models with subtle cross-modal interactions.
Figure 2: The construction pipeline of the H-VLI dataset, combining real-world sampling with generative injection.
To capture the complexity of multimodal hate, particularly when modalities conflict, we introduce the Stratified Multimodal Interaction (SMI) paradigm. For each sample, we annotate a five-tuple, explicitly labeling unimodal sentiments alongside the final multimodal annotation:
where
Taxonomy of Multimodal Interaction:
Under the SMI paradigm, the interplay between unimodal signals (
- (1) Low Complexity (Aligned/Dominant): Covers explicit cases requiring minimal cross-modal deduction. This includes purely benign (0,0,0), redundant hate (1,1,1), and unimodal dominance (1,0,1 or 0,1,1).
- (2) Medium Complexity (Contextual Neutralization): Includes patterns where toxicity in one modality is mitigated by the benign context of the other (1,0,0 or 0,1,0), requiring the model to recognize how context neutralizes apparent slurs.
- (3) High Complexity (Emergent Semantic Shift): Strictly tests the H-VLI benchmark via synergistic hate (0,0,1) and dual-inversion (1,1,0). These demand deep inferential reasoning to resolve cases where the final label contradicts both unimodal signals—either detecting implicit attacks emerging from benign cues (0,0,1) or recognizing how apparent toxicity is neutralized through complex cross-modal irony or counter-speech (1,1,0).
Figure 3: Showcase of different interaction patterns in H-VLI.
Figure 4: Statistical breakdown of the H-VLI dataset.
ARCADE (Asymmetric Reasoning via Courtroom Agent DEbate) simulates a judicial process to decipher multimodal intent shifts.
Figure 5: The architecture of the ARCADE framework, featuring a Gated Dual-Track mechanism for explicit and implicit hate detection.
ARCADE employs a Gated Dual-Track Mechanism to efficiently process multimodal samples:
- Rapid Scan: Every sample first undergoes a preliminary screening by the Prosecutor agent.
- Track I: Fast-Track Trial (Explicit Hate): If the initial scan detects overt hateful cues, the sample is routed to Track I. It undergoes a single-round adversarial exchange before the Judge renders a verdict.
-
Track II: Deep-Dive Trial (Implicit Hate): If no explicit hate is found but latent risks are suspected, the sample enters Track II. It undergoes
$K$ rounds of intensive debate between the Prosecutor and Defender to uncover subtle intent shifts before final adjudication. - Summary Dismissal: If the Prosecutor finds no evidence of hate in its assessment, a Summary Dismissal can be triggered, ruling the sample as non-hateful without further debate.
Core Roles:
- Prosecutor (Risk Discovery): Operates under a "presumption of guilt," actively hypothesizing malice and uncovering latent hate in metaphors and symbols.
- Defender (Contextual Safety): Operates under a "presumption of innocence," scrutinizing evidence for benign motivations like satire or counter-speech.
- Judge (Final Arbiter): Evaluates the adversarial exchange to render a final verdict and provide a natural language explanation.
-
Install Dependencies:
pip install -r requirements.txt
-
Configure API Keys: Rename
.env.exampleto.envand fill in your API keys.
- Priority: GPT and Gemini models will prioritize official APIs if
OPENAI_API_KEYorGEMINI_API_KEYis provided. - Auto-Fallback: If official keys are missing, the system automatically attempts to use alternative providers (e.g.,
API_YI_API_KEY). - Key Polling: For DashScope (Qwen), GLM, and API_YI, you can configure multiple keys (e.g.,
KEY_1, KEY_2) to balance rate limits.
Before running the experiments, you can customize the execution by modifying the following parameters in main.py:
# Select the models for inference
MODELS_TO_RUN = ['qwen3-vl-plus'] # Model(s) acting as the Judge
AUX_MODEL = "qwen3-vl-plus" # Model acting as the Prosecutor and Defender
# Specify the input dataset path
BASE_PATH = os.path.dirname(os.path.abspath(__file__))
INPUT_DATA_PATH = os.path.join(BASE_PATH, "data/test_set.json")# 1. Run ARCADE hierarchical debate system (Default)
python main.py --run_mode ARCADE --samples 100
# 2. Run direct classification baseline (Baseline None)
python main.py --run_mode none --samples 100 --class_mode binary| Argument | Options | Default | Description |
|---|---|---|---|
--run_mode |
ARCADE, none |
ARCADE |
Experiment Mode. ARCADE: Hierarchical debate; none: Direct inference. |
--class_mode |
multiclass, binary |
multiclass |
Classification Standard. multiclass: 0-5 labels; binary: 0-1 labels. |
--samples (-s) |
Integer | 10 |
Number of samples to test. Set to 0 for the full dataset. |
--threads |
Integer | 16 |
Number of concurrent threads for API requests. |
--rounds |
Integer | 3 |
Number of debate rounds for the implicit detection track. |
--seed |
Integer | 2024 |
Random seed for data sampling. |
data/: Directory containing dataset splits (train_set.json,test_set.json) and all sample metadata including tweet text and labels.imgs/: Directory containing source images for FHM, MMHS150K, and H-VLI.main.py: Main entry point for data sampling, concurrent scheduling, and evaluation.court_system.py: Core system logic implementing the ARCADE hierarchical routing.court_prompts.py: Agent prompt templates for the multi-class categorization task.court_prompts_binary.py: Agent prompt templates for the binary detection task.llm_client.py: API client supporting official direct connections and provider-based fallbacks.evaluator.py: Logic for calculating Accuracy, Macro-F1, and other performance metrics.utils.py: Utility functions for data loading, sampling, image encoding, and file operations.
- Results are stored in
answers_system/{class_mode}/{run_mode}/{timestamp}/{model}/. results_{model}.json: Detailed inference logs for every sample.report.txt: Summary report including global metrics and difficulty-wise performance.
The H-VLI dataset is released under the CC BY 4.0 license. Users must adhere to the terms of source datasets (MMHS150K, FHM).
If you find our work helpful, please cite us:
@article{sun2026sumpartsdecipheringintent,
title={More Than Sum of Its Parts: Deciphering Intent Shifts in Multimodal Hate Speech Detection},
author={Runze Sun and Yu Zheng and Zexuan Xiong and Zhongjin Qu and Lei Chen and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2603.21298},
year={2026}
}This repository contains examples of hateful or offensive content. These materials are provided strictly for academic research purposes, specifically for the development and evaluation of multimodal hate speech detection systems. The authors of this work do not condone, support, or agree with any hateful sentiments, stereotypes, or offensive views expressed in these samples.