Skip to content

hanad28/Neural-Networks-and-Deep-Learning

Repository files navigation

Neural Networks and Deep Learning

Module: ECS7026P — Neural Networks and Deep Learning, Queen Mary University of London
Group AZ: Hanad Ali · Muhammad Husaam Ateeq · Blazej Olszta

Python PyTorch Colab Accuracy


Overview

This repository contains two deep learning projects submitted as the coursework for ECS7026P.

Part 1 implements a custom convolutional neural network architecture (CIFARNet) for image classification on CIFAR-10, achieving 96.20% test accuracy.

Part 2 investigates long-tail image recognition on the real-world iNaturalist 2018 dataset (8,142 species, 461,939 images, imbalance ratio 200:1), comparing five methods from the long-tail learning literature against a controlled experimental protocol.


Repository Structure

├── CIFAR-10_image_classification.ipynb        # Part 1: CIFARNet on CIFAR-10
├── Beyond_CIFAR_10_image_classification.ipynb # Part 2: Long-tail recognition on iNaturalist 2018
└── README.md

Part 1 — CIFAR-10 Image Classification

Architecture: CIFARNet

CIFARNet implements a prescribed architecture composed of stacked intermediate blocks followed by an output block.

Intermediate block design:
Each block receives an input image x and outputs a weighted combination of L = 4 independent convolutional branches:

x' = a₁C₁(x) + a₂C₂(x) + a₃C₃(x) + a₄C₄(x)

The weighting vector a is produced by a fully connected layer applied to the channel-wise global average pool of the input — the network learns to weight its own convolutional branches dynamically based on the input's channel statistics.

Key architectural decisions:

Component Detail
Intermediate blocks 6 blocks (B1–B6) in sequence
Branches per block 4 independent conv branches (kernel sizes: 3, 3, 5, 5)
Branch coefficients Softmax-normalised (prevents magnitude explosion under mixed precision)
Inter-block downsampling Strided conv layers (stride 2) between block groups; spatial resolution reduced 32×32 → 16×16 → 8×8
Channel progression 64 → 128 → 256 across block groups
Output block Global average pool → MLP (256 hidden, ReLU, dropout p=0.3) → 10-way classifier
Total parameters 11.88 million

Block group structure:

Block group Kernel sizes Output channels Feature map
B1, B2 [3, 3, 5, 5] 64 32 × 32
B3, B4 [3, 3, 5, 5] 128 16 × 16
B5, B6 [3, 3, 5, 5] 256 8 × 8
Output MLP (256 hidden, dropout 0.3) 10 1 × 1

Training Configuration

Component Setting
Optimiser SGD with Nesterov momentum (μ = 0.9)
Weight decay 5×10⁻⁴ (conv and linear weights only; excludes BN params and biases)
Base learning rate 0.1
LR schedule Linear warmup (5 epochs) → cosine annealing to 1×10⁻⁵ over 195 epochs
Batch size 128
Epochs 200
Loss Cross-entropy with label smoothing (ε = 0.1)
MixUp α = 0.2
Augmentation RandomCrop (32×32, pad 4, reflect) · RandomHorizontalFlip · RandAugment (n=2, m=9) · RandomErasing (p=0.25)
Precision Automatic mixed precision (fp16 forward, fp32 master weights)
Initialisation Kaiming normal (fan-out, ReLU)

Results

Metric Value
Best test accuracy 96.20% (epoch 199)
Marking bracket ≥ 92% (top bracket)

The training loss curve shows steady descent over ~78,200 batches, with high per-batch variance driven by MixUp's U-shaped mixing coefficients and RandomErasing. Both training and test accuracy curves rise together throughout, with a ~4 percentage point generalisation gap at epoch 199 — consistent with a well-regularised model trained from scratch.


Part 2 — Beyond CIFAR-10: Long-Tail Image Recognition

Motivation and Dataset

We study long-tail image recognition — a recognised challenge in computer vision where most classes have very few training examples and a small number of head classes dominate the distribution.

We use iNaturalist 2018 (Van Horn et al., 2018), a real biodiversity dataset with naturally occurring class imbalance, rather than a synthetic benchmark. The task is species classification across 8,142 classes.

Dataset property Value
Total images 461,939
Number of classes 8,142
Min samples per class 3
Max samples per class 602
Imbalance ratio 200.67:1
Mean samples per class 34
Median samples per class 15

Head classes (top 20%, 1,628 classes) average 115 training samples; tail classes (bottom 20%, 1,628 classes) average only 8.77.

Model and Protocol

Backbone: ImageNet-pretrained ResNet-50 (IMAGENET1K_V2 weights) with the classifier replaced by an 8,142-way linear head (40.19M parameters, all updated end-to-end except during cRT Stage 2).

Data split: 60/20/20 stratified train/val/test split (277,163 / 92,388 / 92,388 images). Stratification is essential — without it, many tail classes would be absent from one or more splits.

Evaluation metric: Balanced accuracy (primary), not overall accuracy — overall accuracy is dominated by head classes and masks poor tail performance.

Each method was repeated with seeds 42, 123, and 456; all reported numbers are means and standard deviations across seeds.

Methods Compared

Five methods spanning the principal solution categories in the long-tail literature:

Method Description
Baseline CE Standard cross-entropy on the natural distribution — no rebalancing
Re-weighting Loss weighted by inverse class frequency (1/nʸ)
Resampling WeightedRandomSampler oversampling rare classes by 1/nʸ at the data level
Logit Adjustment Baseline training; at inference logits adjusted by subtracting τ·log(πʸ), τ=1.0
Two-Stage cRT Stage 1: full network on natural distribution (80 epochs); Stage 2: backbone frozen, classifier retrained with balanced sampling (20 epochs)

Results

Method Overall Balanced Macro-F1 Head Medium Tail
Baseline CE 65.39 ± 0.19 54.33 ± 0.36 61.82 ± 0.24 64.30 53.86 45.74
Re-weighting 57.15 ± 0.16 51.03 ± 0.17 57.18 ± 0.16 58.44 50.70 44.60
Resampling 59.34 ± 0.19 53.01 ± 0.16 58.71 ± 0.18 60.17 52.54 47.26
Logit Adjustment 61.66 ± 0.09 58.72 ± 0.18 57.13 ± 0.03 60.51 58.97 56.19
Two-Stage cRT 61.67 ± 0.20 57.32 ± 0.08 57.62 ± 0.19 61.50 57.56 52.43

Values are test-set percentages (mean ± std across 3 seeds). Bold = best by balanced accuracy.

Key Findings

  1. Overall accuracy is a misleading metric on long-tailed data. The baseline tops overall accuracy (65.39%) but achieves only 45.74% on tail classes — a 19.6 pp gap that is invisible if you only report the headline number.

  2. Inference-time correction outperforms training-time rebalancing. Logit adjustment achieves the best balanced accuracy (58.72%) and tail accuracy (56.19%) without any change to training. Re-weighting and resampling both underperform the baseline on balanced accuracy at this scale of imbalance (~200:1), consistent with Buda et al. (2018).

  3. Re-weighting and resampling are near-equivalent, as theory predicts. The tail accuracy gap between the two is 2.66 pp (47.26% vs 44.60%), confirming near-equivalence of applying gradient pressure via loss weights vs sampling frequency (Buda et al., 2018).


How to Run

Both notebooks are designed to run on Google Colab. Set the runtime to GPU before starting: Runtime > Change runtime type > T4 GPU.

Part 1 — CIFAR-10 Classification

Run all cells in order. The CIFAR-10 dataset is downloaded automatically via torchvision.

Part 2 — Long-Tail Recognition (iNaturalist 2018)

The notebook is split into three sections:

Section Purpose Run on Colab?
Section 0 — Shared Setup Imports, device detection, hyperparameters, model architecture, shared utilities ✅ Run first
Section 1 — Training Full local training pipeline that produced the saved checkpoints and logs ⏭ Skip on Colab
Section 2 — Colab Evaluation Downloads checkpoints and logs from HuggingFace, recreates all plots and tables, validates hypotheses ✅ Run after Section 0

To reproduce all results on Colab (no dataset download required):

  1. Run all Section 0 cells
  2. Skip Section 1 entirely
  3. Run Section 2 cells top to bottom — checkpoints and training logs are downloaded automatically from HuggingFace (~9 GB)

Model weights: checkpoints.zip (~9 GB) hosted at HuggingFace:
https://huggingface.co/datasets/husaam7/inat2018-checkpoints


Tech Stack

  • PyTorch — model implementation and training
  • torchvision — CIFAR-10 DataLoaders, ResNet-50 pretrained weights, data augmentation
  • Hugging Face / iNaturalist 2018 — dataset for Part 2
  • Matplotlib — training curves and result plots
  • Google Colab — training environment (GPU)

References

  • Buda, M., Maki, A. and Mazurowski, M.A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, pp. 249–259.
  • Cubuk, E.D., Zoph, B., Shlens, J. and Le, Q.V. (2020). RandAugment: Practical automated data augmentation with a reduced search space. NeurIPS, 33, pp. 18613–18624.
  • Cui, Y. et al. (2019). Class-balanced loss based on effective number of samples. CVPR, pp. 9268–9277.
  • Kang, B. et al. (2020). Decoupling representation and classifier for long-tailed recognition. ICLR.
  • Loshchilov, I. and Hutter, F. (2017). SGDR: Stochastic gradient descent with warm restarts. ICLR.
  • Menon, A.K. et al. (2021). Long-tail learning via logit adjustment. ICLR.
  • Szegedy, C. et al. (2016). Rethinking the Inception architecture for computer vision. CVPR, pp. 2818–2826.
  • Van Horn, G. et al. (2018). The iNaturalist species classification and detection dataset. CVPR, pp. 8769–8778.
  • Zhang, H. et al. (2018). MixUp: Beyond empirical risk minimisation. ICLR.

About

Custom CNN achieving 96.20% on CIFAR-10 + long-tail recognition study on iNaturalist 2018 (5 methods, 8,142 classes)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors