With the popularity of large language models (LLMs), undesirable societal problems like misinformation production and academic misconduct have been more severe, making LLM-generated text detection now of unprecedented importance. Though existing methods have made remarkable progress, they mostly consider publicly known LLMs when testing the performance and a new challenge brought by text from privately-tuned LLMs is largely underexplored.
Due to the rapid development of open-source models like LLaMA and Qwen series and efficient LLM training methods, even ordinary users can now easily possess private LLMs by fine-tuning an open-source one with private corpora. This could lead to a significant performance drop of existing detectors in practice, due to their poor capability of capturing the essential LLM traits robust to fine-tuning operations.
Our preliminary examination reveals that fine-tuning an LLM with 11M tokens could make a detector's accuracy jump from 100% to only 59% at most. To address this issue, we propose PhantomHunter, an LLM-generated text detector specialized for detecting text from unseen privately-tuned LLMs, whose family-aware learning framework captures family-level traits shared across the base models and their derivatives, instead of memorizing individual characteristics.
Specifically, PhantomHunter first extracts base model features and enhances the family-shared information using a contrastive family-aware learning module. The enhanced features are then fed into a mixture-of-experts module containing multiple experts for corresponding families for final predictions. Experiments on data from four widely-adopted LLM families (LLaMA, Gemma, Mistral, and Qwen) show PhantomHunter's superiority over 8 baselines and 11 industrial services.
Here is the official implementation of "PhantomHunter: A Multi-Task Framework with Mixture of Experts for Generalized Generated Text Detection".
PhantomHunter is a unified framework for detecting AI-generated text that leverages Mixture of Experts (MoE) architecture, Contrastive Learning (CL), and Low-Rank Adaptation (LoRA) to achieve state-of-the-art performance across multiple AI model families.
PhantomHunter and the training process. Given a text sample
We simulate two common LLM usage scenarios: writing (69,297 arXiv paper abstracts) and question-answering (3,062 Q&A pairs from ELI5, finance, and medicine domains). We select four open-source models (LLaMA-2-7B-Chat, Gemma-7B-it, Mistral-7B-Instruct-v0.1, Qwen2.5-7B-Instruct) and fine-tune each with full-parameter and LoRA methods on domain-specific corpora, resulting in 48 derivative models for evaluation.
Some test data can be available at ./data/
pip install -r requirements.txt
- loading models
# cd ./genfeatures/
# you can modify you own model path in ./genfeatures/backend_api.py
python backend_api.py --port 6009 --timeout 30000 --debug --model=llama --gpu=0
python backend_api.py --port 6010 --timeout 30000 --debug --model=gemma --gpu=1
python backend_api.py --port 6011 --timeout 30000 --debug --model=mistral --gpu=2
python backend_api.py --port 6012 --timeout 30000 --debug --model=qwen2.5 --gpu=4- genfeatures
# you should modify the en_input_files and en_outfiles path in ./genfeatures/gen_features.py
python ./genfeatures/gen_features.py --get_en_features_multithreadingpython main.py \
--cuda \
--seed 2024 \
--exp-name moe+logits+cl_arxiv-lora_5e-4 \
--train-path /feature/arxiv_new/lora/train.jsonl \
--val-path /feature//arxiv_new/lora/val.jsonl \
--test-path /feature/arxiv_new/lora/test_ood.jsonl \
--batch-size 64 \
--lr 5e-4 \
--trainpython main.py \
--cuda \
--seed 2024 \
--exp-name moe+logits+cl_arxiv-lora_5e-4 \
--train-path /feature/arxiv_new/lora/train.jsonl \
--val-path /feature//arxiv_new/lora/val.jsonl \
--test-path /feature/arxiv_new/lora/test_ood.jsonl \
--batch-size 64 \
--lr 5e-4 \
--testMIT License

