Skip to content

Latest commit

 

History

History
508 lines (374 loc) · 26.8 KB

File metadata and controls

508 lines (374 loc) · 26.8 KB

Deepfakes Detection

License Stars Downloads Last Commit Status Release Repo Size

DeepGuard Banner

Task FF++ Celeb-DF KODF

Models Python PyTorch W&B

🇰🇷 한국어 버전 | 🇯🇵 日本語版 | 📈 Model Evaluation | 🔮 Try Demo

📌 Contents

💡 Install & Requirements

To install requirements:

pip install -r requirements.txt

🛠 SetUp

Clone the repository and move into it:

git clone https://github.com/HanMoonSub/DeepGuard.git

cd DeepGuard

📚 DeepFake Video BenchMark Datasets

To evaluate the generalization and robustness of our deepfake detection model, we utilize three large-scale, widely recognized benchmark datasets. Each dataset presents unique challenges and covers different types of forgery methods.

Dataset Real Videos Fake Videos Year Participants Description (Paper Title) Details
Celeb-DF-v2 890 5,639 2019 59 A Large-scale Challenging Dataset for DeepFake Forensics 🔗 Readme
FaceForensics++ 1,000 6,000 2019 1,000 Learning to Detect Manipulated Facial Images 🔗 Readme
KoDF 62,166 175,776 2020 400 Large-Scale Korean Deepfake Detection Dataset 🔗 Readme

⚙️ Data Preparation

Our preprocessing pipeline is designed to efficiently extract facial features from videos and prepare them for high-accuracy deepfake detection.

Detect Original Face

To maximize preprocessing efficiency, face detection is performed only on original (real) videos. Since mnipulated videos in DeepFake Video BenchMark Datasets share the same spatial coordinates as their sources, these bounding boxes are reused for the corresponding deepfake versions.

🚀 Efficiency Optimizations

  • Lightweight Model: Uses yolov8n-face for high-speed inference without sacrificing accuracy.

  • Targeted Processing: By detecting faces only in original videos, the total detection workload is reduced by approximately 80%.

  • Dynamic Rescaling: To maintain consistent inference speed across different resolutions, frames are automatically resized based on their dimensions:

Frame Size(Longest Side) Scale Factor Action
< 300px 2.0
300px - 700px 1.0
700px - 1500px 0.5
> 1500px 0.33

Face Cropping & Landmark Extraction

This module extracts face crops from both original and deepfake videos using the bounding boxes generated in the previous step. It also performs landmark detection to facilitate advanced augmentations like Landmark-based Cutout

🛠 Key Features

  • Dynamic Margin with Jitter: Adds a configurable margin around the face. The margin_jitter parameter introduces random variance to the crop size, making the model more robust to different face scales.

  • Landmark Localization: Detects 5 primary facial landmarks (eyes, nose, mouth corners) and saves them as .npy files.

DATA_ROOT/
├── crops/
│   └── {video_id}/
│       ├── 12.png
│       └── ...
├── landmarks/
│   └── {video_id}/
│       ├── 12.npy
│       └── ...
└── train_frame_metadata.csv

Dataset-Specific Pipelines

Click the links below to view the specific preprocessing details for each dataset:

🏗 Model Architecture

Multi Scale Efficient Global Context Vision Transformer is an optimized multi-scale hybrid architecture that integrates CNN-driven spatial inductive bias with hierarchical attention mechanisms to effectively identify subtle(local) artifacts and macro(global) artifacts for robust deepfake forensics."

Explore More Details

We utilizes two distinct types of self-attention to capture both long-range and short-range information across feature maps.

  • Local Window Attention: this model efficiently captures local textures and precise spatial details while maintaining linear computational complexity relative to the image size.

  • Global Window Attention: Unlike Swin Transformer, this module utilizes global-queries that interact with local window keys and values. This allows each local region to incorporate global context, effectively capturing long-range dependencies and providing a comprehensive understanding of the entire spatial structure

🧬 Model Zoo

Model Resolution # Total Params(M) # Backbone(M) # L-ViT(M) # H-ViT(M) FLOPs (G) Model Config
⚡ ms_eff_gcvit_b0 224 X 224 8.7 3.6(41.4%) 1.7(19.5%) 3.3(37.9%) 0.87 spec
🔥 ms_eff_gcvit_b5 384 X 384 50.3 27.3(54.3%) 6.6(13.1%) 16.1(32.0%) 13.64 spec

🚀 Training

We provide training scripts for both ms_eff_vit and ms_eff_gcvit. We recommend using Google Colab for free GPU access and Weightes & Biases(W&B) for experiment tracking

📊 Weight & Biases Experiments

!python -m train_eff_vit \ # train_eff_gcvit
    --root-dir DATA_ROOT \ 
    --model-ver "ms_eff_vit_b5" \ # ms_eff_vit_b0, ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5
    --dataset "ff++" \ # ff++, celeb_df_v2, kodf
    --seed 2025 \ # for reproducibility
    --wandb-api-key "your-api-key" # Write your own api key

📈 Model Evaluation

!python -m inference.predict_video \
    --root-dir DATA_ROOT \
    --margin-ratio 0.2 \
    --conf-thres 0.5 \
    --min-face-ratio 0.01 \
    --model-name "ms_eff_gcvit_b0" \ # ms_eff_vit_b0, ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5
    --model-dataset "kodf" \ # ff++, celeb_df_v2, kodf
    --num-frames 20 \
    --tta-hflip 0.0 \
    --agg-mode "conf" \

Celeb DF(v2) Pretrained Models

Model Variant Test@Acc Test@Auc Test@log_loss Download Train Config
ms_eff_gcvit_b0 0.9842 0.9965 0.0283 model recipe
ms_eff_gcvit_b5 0.9981 0.9984 0.0089 model recipe

FaceForensics++ Pretrained Models

Model Variant Test@Acc Test@Auc Test@log_loss Download Train Config
ms_eff_gcvit_b0 0.9808 0.9969 0.0637 model recipe
ms_eff_gcvit_b5 0.9850 0.9974 0.0492 model recipe

KoDF Pretrained Models

Model Variant Test@Acc Test@Auc Test@log_loss Download Train Config
ms_eff_gcvit_b0 0.9655 0.9792 0.1237 model recipe
ms_eff_gcvit_b5 0.9850 0.9974 0.0492 model recipe

💻 Model Usage

Quick Start You can load the models directly via the DeepGuard package or through the timm interface.

Available Datasets: celeb_df_v2, ff++, kodf

Installation

# pip install -U git+https://github.com/HanMoonSub/DeepGuard.git
pip install deepguard

Option A: Direct Import (via DeepGuard)

from deepguard import ms_eff_gcvit_b0, ms_eff_gcvit_b5

model = ms_eff_gcvit_b0(pretrained=True, dataset="celeb_df_v2")
model = ms_eff_gcvit_b5(pretrained=True, dataset="ff++")

Option B: Using timm Interface (via timm)

import timm
import deepguard

model = timm.create_model("ms_eff_gcvit_b0", pretrained=True, dataset="ff++")
model = timm.create_model("ms_eff_gcvit_b5", pretrained=True, dataset="kodf")

Option C: Hugging Face Hub

import torch
from huggingface_hub import hf_hub_download
from deepguard import ms_eff_gcvit_b0  # or ms_eff_gcvit_b5

REPO_ID = "KoreaPeter/ms-eff-gcvit-deepfake"

ckpt = hf_hub_download(REPO_ID, "ms_eff_gcvit_b0_kodf.bin")  # celeb_df_v2 | ff++ | kodf
model = ms_eff_gcvit_b0(pretrained=False)
model.load_state_dict(torch.load(ckpt, map_location="cpu"))
model.eval()

🔮 Predict Image & Video

Predict DeepFake Image

from inference.image_predictor import ImagePredictor

# Initialize the predictor
predictor = ImagePredictor(
            margin_ratio = 0.2, #  Margin ratio around the detected face crop
            conf_thres = 0.5, # Confidence threshold for face detection
            min_face_ratio = 0.01, # Minimum face-toframe size ratio to process 
            model_name = "ms_eff_vit_b0", #  ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5  
            dataset = "celeb_df_v2" # ff++, kodf
            )

# Run Inference
result = predictor.predict_img(
            img_path="path/to/image.jpg",
            tta_hflip=0.0 # Horizontal Flip for Test-Time Augmentation 
            )

print(f"Deepfake Probability: {result:.4f}")

Predict DeepFake Video

from inference.video_predictor import VideoPredictor

# Initialize the predictor
predictor = VideoPredictor(
            margin_ratio = 0.2, #  Margin ratio around the detected face crop
            conf_thres = 0.5, # Confidence threshold for face detection
            min_face_ratio = 0.01, # Minimum face-toframe size ratio to process 
            model_name = "ms_eff_vit_b0", #  ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5  
            dataset = "celeb_df_v2" # ff++, kodf
            )

# Run Inference
result = predictor.predict_video(
            video_path = "path/to/video.mp4",
            num_frames = 20, # Number of frames to sample per video
            agg_mode = "conf", # Aggregation Method: 'conf', 'mean', 'vote'
            tta_hflip=0.0 # Horizontal Flip for Test-Time Augmentation 
            )

print(f"Deepfake Probability: {result:.4f}")

🎨 DeepFake AI Explainability

Deepfake detection is only as trustworthy as its explanations. DeepGuard integrates a production-ready XAI Toolkit that visualizes where and why the model flags a face as manipulated — turning a black-box score into actionable forensic evidence.

⭐ Validated on hybrid CNN-ViT architectures, specifically MS-EffViT and MS-EffGCViT.
⭐ Dual-Branch Analysis: Dual-branch design mirrors the model's own multi-scale reasoning

🧠 How Dual-Branch XAI Works

Branch Feature Map Focus Best For
High Resolution Local Forgery artifacts Skin texture, boundary blending, compression traces
Low Resolution Global Semantic Structure Lighting inconsistency, facial geometry, Shadow artifacts

📐 XAI Methods

Each method is assigned to the branch where it performs best empirically.

Branch Method 🎯 Core Idea
low level HiResCAM Like GradCAM but element-wise multiply the activations with the gradients; provably guaranteed faithfulness for certain models
low level GradCAMElementWise Like GradCAM but element-wise multiply the activations with the gradients then apply a ReLU operation before summing
low level LayerCAM Spatially weight the activations by positive gradients. Works better especially in lower layers
--- --- ---
high level EigenGradCAM Like EigenCAM but with class discrimination: First principle component of Activations*Grad. Looks like GradCAM, but cleaner
high level GradCAM++ Like GradCAM but uses second order gradients
high level XGradCAM Like GradCAM but scale the gradients by the normalized activations
  • aug_smooth applies TTA (horizontal flips) before averaging CAMs → smoother, more object-aligned maps
  • eigen_smooth applies PCA noise reduction → retains dominant forgery pattern only

💡 DeepFake XAI Usage

Low-Level Branch — Local Artifact Detection

from explainability import HiResCAMExplainer, GradCAMElementWiseExplainer, LayerCAMExplainer

explainer = HiResCAMExplainer(
    model_name   = "ms_eff_gcvit_b0",  # or ms_eff_vit_b0, ms_eff_gcvit_b5, ms_eff_vit_b5
    dataset      = "celeb_df_v2",       # or ff++, kodf
    branch_level = "low",
)

High-Level Branch — Global Semantic Detection

from explainability import EigenGradCAMExplainer, GradCAMPlusPlusExplainer, XGradCAMExplainer

explainer = EigenGradCAMExplainer(
    model_name   = "ms_eff_gcvit_b0",
    dataset      = "celeb_df_v2",
    branch_level = "high",
)

🎨 Visualization Modes

1. Heatmap — Continuous activation distribution

result = explainer.display_heatmap_on_image(
    img_path     = "path/to/image.jpg",
    category     = 1,      # 0: Real, 1: Fake
    threshold    = 0.5,    # binarization cutoff (0.5~1.0), or "auto" for Otsu
    image_weight = 0.5,    # 0.0: heatmap only ← → 1.0: original only
    aug_smooth   = False,  # TTA smoothing (not supported on 'pro' models)
    eigen_smooth = False,  # PCA noise reduction
)

2. Bounding Box — Discrete forgery region localization

result = explainer.display_bbox_on_image(
    img_path     = "path/to/image.jpg",
    category     = 1,
    threshold    = 0.5,
    thickness    = 1,
    aug_smooth   = False,
    eigen_smooth = False,
)

3. Heatmap + BBox — Full overlay (recommended for reporting)

result = explainer.display_heatmap_bbox_on_image(
    img_path     = "path/to/image.jpg",
    category     = 1,
    threshold    = 0.5,
    image_weight = 0.5,
    aug_smooth   = False,
    eigen_smooth = False,
)

📊 Visual Results

MS-EFF-VIT — Low-Level Branch

Model Branch-Level Image HiresCam GradCamElementwise LayerCam
⚡ ms-eff-vit-b0
🔥 ms-eff-vit-b5

MS-Eff-ViT — High-Level Branch

Model Branch-Level Image EigenGradCam GradCamPlusPlus XGradCam
⚡ ms-eff-vit-b0
🔥 ms-eff-vit-b5

MS-EFF-GCVIT — Low-Level Branch

Model Branch-Level Image HiresCam GradCamElementwise LayerCam
⚡ ms-eff-gcvit-b0
🔥 ms-eff-gcvit-b5

MS-Eff-GCViT — High-Level Branch

Model Branch-Level Image EigenGradCam GradCamPlusPlus XGradCam
⚡ ms-eff-gcvit-b0
🔥 ms-eff-gcvit-b5

📓 Tutorials

The jupyter notebooks themselves can be found under the tutorials folder in the git repository.

📬 Authors

This project was developed as a Senior Graduation Project by the Department of Software at Chungbuk National University (CBNU), Republic of Korea.

  • 한문섭: Data & Backend Engineering (Data Preprocessing Pipeline, DB Schema Design) — hanmoon3054@gmail.com
  • 이예솔: UI/UX & Frontend Engineering (UI/UX Design, User Dashboard, Model Visualization) — yesol4138@chungbuk.ac.kr
  • 서윤제: AI Engineering (AI Model Architecture, Inference API Design, Model Serving) — seoyunje2001@gmail.com

📝 Reference

  1. facenet-pytorch - Pretrained Face Detection(MTCNN) and Recognition(InceptionResNet) Models by Tim Esler
  2. face-cutout - Face Cutout Library by Sowmen
  3. Celeb-DF++ - Celeb-DF++ Dataset by OUC-VAS Group
  4. DeeperForensics-1.0 - DeeperForensics-1.0 Dataset by Endless Sora
  5. Deepfake Detection - Detection of Video Deepfake using ResNext and LSTM by Abhijith Jadhav
  6. deepfake-detection-project-v4 - Multiple Deep Learning Models by Ameen Caslam
  7. Awesome-Deepfake-Detection - A curated list of tools, papers and code by Daisy Zhang
  8. Pytorch-Grad-Cam - Advanced Visual Explanations for PyTorch Models

⚖️ License

This project is licensed under the terms of the MIT license.