Skip to content

RISHIT7/object-detection-thermal-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparative Analysis of Object Detection Frameworks on Aerial Thermal Imagery

Single-Stage, Two-Stage, and Transformer-Based Detection on the BIRDSAI Dataset

Course: COL780 -- Computer Vision, IIT Delhi
Author: Rishit Jakharia (2022CS11621)


Overview

This repository implements and benchmarks three major object detection paradigms on the BIRDSAI thermal infrared aerial surveillance dataset, which contains Long-Wave Infrared (LWIR) imagery captured from UAVs for wildlife and human monitoring.

The project evaluates:

Task Model Paradigm
Task 1 YOLOv8 (Nano / Small) Single-stage detection
Task 2 Faster R-CNN + FPN Two-stage detection with CE and Focal Loss ablation
Task 3 Deformable DETR Transformer-based detection with three fine-tuning strategies

All models are evaluated under a Real to Real setting using the scale-aware, class-wise mAP metric with 11-point interpolation at IoU 0.5.


Key Results

Overall mAP Comparison (IoU @ 0.5)

Model Overall Animals Humans
YOLOv8 Nano 0.338 0.585 0.091
YOLOv8 Small 0.306 0.521 0.091
Faster R-CNN (CE) 0.325 0.507 0.143
Faster R-CNN (Focal) 0.237 0.369 0.106
Deformable DETR -- Exp 2 (Decoder FT) 0.358 0.625 0.091

Deformable DETR with decoder-only fine-tuning achieved the highest overall accuracy, while YOLO maintained the fastest inference throughput suitable for real-time edge deployment.


Project Structure

code/
├── task1.sh                     # Entry point for YOLO training and evaluation
├── task2.sh                     # Entry point for Faster R-CNN
├── task3.sh                     # Entry point for Deformable DETR
│
├── scripts/
│   ├── task1_prep.py            # BIRDSAI to Ultralytics YOLO format converter
│   ├── task2_train.py           # Faster R-CNN training and evaluation driver
│   └── task3_train.py           # Deformable DETR training and evaluation driver
│
├── src/
│   ├── data/
│   │   ├── birdsai.py           # BIRDSAI PyTorch Dataset with scale-aware indexing
│   │   ├── mot_parser.py        # MOT annotation parser and scale prior computation
│   │   └── transforms.py        # Data augmentation transforms
│   │
│   ├── models/
│   │   ├── yolo.py              # YOLOv8 training and evaluation wrapper
│   │   ├── yolo_adapter.py      # Adapter bridging Ultralytics output to custom evaluator
│   │   ├── frcnn.py             # Faster R-CNN: full pipeline from scratch
│   │   ├── frcnn_backbone.py    # ResNet-18 + Feature Pyramid Network backbone
│   │   ├── frcnn_heads.py       # RPN head and Fast R-CNN classification head
│   │   └── detr.py              # Deformable DETR wrapper with fine-tune mode selection
│   │
│   ├── engine/
│   │   ├── trainer.py           # Generic training loop with mixed precision and early stopping
│   │   └── evaluator.py         # Scale-aware, class-wise mAP evaluator (11-point interpolation)
│   │
│   ├── losses/
│   │   └── focal_loss.py        # Focal Loss implementation for class imbalance
│   │
│   └── utils/
│       ├── logger.py            # Experiment logger: JSONL metrics and checkpoint management
│       ├── utils.py             # Miscellaneous helpers
│       └── visualization.py     # Detection visualization and comparison utilities
│
└── notebook/
    └── vizualization.ipynb      # Qualitative analysis and figure generation

Architecture Details

Task 1 -- YOLO

Leverages Ultralytics YOLOv8 as a single-stage, anchor-free detector. The BIRDSAI MOT annotations are first converted to the Ultralytics label format via a parallel conversion pipeline (YOLOFormatBuilder). Both the Nano and Small backbone variants were compared to study the regularization effect of model capacity on low-texture thermal imagery.

Task 2 -- Faster R-CNN + FPN

A complete, from-scratch implementation comprising:

  • Backbone: ResNet-18 pretrained on ImageNet, extended with a 4-level Feature Pyramid Network for multi-scale feature fusion.
  • Region Proposal Network: Generates object proposals using multi-scale anchors (base sizes 16--128, aspect ratios 0.5, 1.0, 2.0) with IoU-based positive/negative sampling.
  • Fast R-CNN Head: Two fully-connected layers (1024-d each) followed by per-class bounding box regression and classification.
  • Loss Ablation: Standard Cross-Entropy vs Focal Loss (alpha=0.25, gamma=2.0) to study class imbalance handling.

Task 3 -- Deformable DETR

Uses the SenseTime pretrained Deformable DETR via HuggingFace Transformers. Three fine-tuning strategies are compared:

Experiment Trainable Components
Exp 1 (Full) Entire network -- backbone, encoder, decoder, and heads
Exp 2 (Decoder) Decoder + classification/bbox heads only
Exp 3 (Encoder) Encoder + classification/bbox heads only

Decoder-only fine-tuning (Exp 2) proved most effective -- it preserves robust feature extraction learned from COCO while adapting the object queries to the thermal domain.


Dataset

The BIRDSAI dataset provides aerial LWIR thermal imagery for wildlife conservation and anti-poaching surveillance.

  • Classes: Animals (0), Humans (1)
  • Splits: TrainReal, TestReal
  • Scale categories (video-level, based on average bounding box area):
    • Small: < 200 px
    • Medium: 200 -- 2000 px
    • Large: > 2000 px
  • Annotations: MOT-style CSV files with bounding boxes, class labels, and object IDs

References

  1. Bondi et al., "BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos," WACV 2020.
  2. Redmon et al., "You Only Look Once: Unified, Real-Time Object Detection," CVPR 2016.
  3. Ren et al., "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE TPAMI, 2017.
  4. Zhu et al., "Deformable DETR: Deformable Transformers for End-to-End Object Detection," ICLR 2021.

About

A comprehensive comparative analysis of single-stage (YOLOv8), two-stage (Faster R-CNN + FPN), and Transformer-based (Deformable DETR) object detection architectures on the BIRDSAI aerial thermal infrared dataset for wildlife and human surveillance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages