Skip to content

ICTMCG/Phantomhunter

Repository files navigation

PhantomHunter: AI-Generated Text Detection with Multi-Task MoE Framework

With the popularity of large language models (LLMs), undesirable societal problems like misinformation production and academic misconduct have been more severe, making LLM-generated text detection now of unprecedented importance. Though existing methods have made remarkable progress, they mostly consider publicly known LLMs when testing the performance and a new challenge brought by text from privately-tuned LLMs is largely underexplored.

Due to the rapid development of open-source models like LLaMA and Qwen series and efficient LLM training methods, even ordinary users can now easily possess private LLMs by fine-tuning an open-source one with private corpora. This could lead to a significant performance drop of existing detectors in practice, due to their poor capability of capturing the essential LLM traits robust to fine-tuning operations.

Our preliminary examination reveals that fine-tuning an LLM with 11M tokens could make a detector's accuracy jump from 100% to only 59% at most. To address this issue, we propose PhantomHunter, an LLM-generated text detector specialized for detecting text from unseen privately-tuned LLMs, whose family-aware learning framework captures family-level traits shared across the base models and their derivatives, instead of memorizing individual characteristics.

Specifically, PhantomHunter first extracts base model features and enhances the family-shared information using a contrastive family-aware learning module. The enhanced features are then fed into a mixture-of-experts module containing multiple experts for corresponding families for final predictions. Experiments on data from four widely-adopted LLM families (LLaMA, Gemma, Mistral, and Qwen) show PhantomHunter's superiority over 8 baselines and 11 industrial services.


Here is the official implementation of "PhantomHunter: A Multi-Task Framework with Mixture of Experts for Generalized Generated Text Detection".

Overview

PhantomHunter

PhantomHunter is a unified framework for detecting AI-generated text that leverages Mixture of Experts (MoE) architecture, Contrastive Learning (CL), and Low-Rank Adaptation (LoRA) to achieve state-of-the-art performance across multiple AI model families.

Architecture

PhantomHunter Architecture

PhantomHunter and the training process. Given a text sample $\mathbf{x}$, it 1) extracts the probability feature from $M$ base models and encode them with CNN and transformer blocks; 2) predicts the family of $\mathbf{x}$ to determine the family gating weights; and 3) feeds the representation $\mathbf{R}_{F}$ to a mixture-of-experts network controlled by the gating weights from Step 2 for final prediction of $\mathbf{x}$ being LLM-generated. During training, contrastive learning is applied in each mini-batch to better model family relationships. The red terms are loss functions.

Data

We simulate two common LLM usage scenarios: writing (69,297 arXiv paper abstracts) and question-answering (3,062 Q&A pairs from ELI5, finance, and medicine domains). We select four open-source models (LLaMA-2-7B-Chat, Gemma-7B-it, Mistral-7B-Instruct-v0.1, Qwen2.5-7B-Instruct) and fine-tune each with full-parameter and LoRA methods on domain-specific corpora, resulting in 48 derivative models for evaluation.

Some test data can be available at ./data/

Quick Start

Installation

pip install -r requirements.txt

Genfeature through four white-box model

  1. loading models
# cd ./genfeatures/
# you can modify you own model path in ./genfeatures/backend_api.py
python backend_api.py --port 6009 --timeout 30000 --debug --model=llama --gpu=0
python backend_api.py --port 6010 --timeout 30000 --debug --model=gemma --gpu=1
python backend_api.py --port 6011 --timeout 30000 --debug --model=mistral --gpu=2
python backend_api.py --port 6012 --timeout 30000 --debug --model=qwen2.5 --gpu=4
  1. genfeatures
# you should modify the en_input_files and en_outfiles path in ./genfeatures/gen_features.py
python ./genfeatures/gen_features.py --get_en_features_multithreading

Train

python main.py \
    --cuda  \
    --seed 2024 \
    --exp-name moe+logits+cl_arxiv-lora_5e-4 \
    --train-path /feature/arxiv_new/lora/train.jsonl \
    --val-path /feature//arxiv_new/lora/val.jsonl \
    --test-path /feature/arxiv_new/lora/test_ood.jsonl \
    --batch-size 64 \
    --lr 5e-4 \
    --train

Evaluation

python main.py \
    --cuda  \
    --seed 2024 \
    --exp-name moe+logits+cl_arxiv-lora_5e-4 \
    --train-path /feature/arxiv_new/lora/train.jsonl \
    --val-path /feature//arxiv_new/lora/val.jsonl \
    --test-path /feature/arxiv_new/lora/test_ood.jsonl \
    --batch-size 64 \
    --lr 5e-4 \
    --test

License

MIT License

About

We propose PhantomHunter, a detector that reliably identifies text from privately fine-tuned LLMs by capturing family-level traits rather than memorizing individual model behaviors.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors