Neural Networks and Deep Learning

Module: ECS7026P — Neural Networks and Deep Learning, Queen Mary University of London
Group AZ: Hanad Ali · Muhammad Husaam Ateeq · Blazej Olszta

Overview

This repository contains two deep learning projects submitted as the coursework for ECS7026P.

Part 1 implements a custom convolutional neural network architecture (CIFARNet) for image classification on CIFAR-10, achieving 96.20% test accuracy.

Part 2 investigates long-tail image recognition on the real-world iNaturalist 2018 dataset (8,142 species, 461,939 images, imbalance ratio 200:1), comparing five methods from the long-tail learning literature against a controlled experimental protocol.

Repository Structure

├── CIFAR-10_image_classification.ipynb        # Part 1: CIFARNet on CIFAR-10
├── Beyond_CIFAR_10_image_classification.ipynb # Part 2: Long-tail recognition on iNaturalist 2018
└── README.md

Part 1 — CIFAR-10 Image Classification

Architecture: CIFARNet

CIFARNet implements a prescribed architecture composed of stacked intermediate blocks followed by an output block.

Intermediate block design:
Each block receives an input image x and outputs a weighted combination of L = 4 independent convolutional branches:

x' = a₁C₁(x) + a₂C₂(x) + a₃C₃(x) + a₄C₄(x)

The weighting vector a is produced by a fully connected layer applied to the channel-wise global average pool of the input — the network learns to weight its own convolutional branches dynamically based on the input's channel statistics.

Key architectural decisions:

Component	Detail
Intermediate blocks	6 blocks (B1–B6) in sequence
Branches per block	4 independent conv branches (kernel sizes: 3, 3, 5, 5)
Branch coefficients	Softmax-normalised (prevents magnitude explosion under mixed precision)
Inter-block downsampling	Strided conv layers (stride 2) between block groups; spatial resolution reduced 32×32 → 16×16 → 8×8
Channel progression	64 → 128 → 256 across block groups
Output block	Global average pool → MLP (256 hidden, ReLU, dropout p=0.3) → 10-way classifier
Total parameters	11.88 million

Block group structure:

Block group	Kernel sizes	Output channels	Feature map
B1, B2	[3, 3, 5, 5]	64	32 × 32
B3, B4	[3, 3, 5, 5]	128	16 × 16
B5, B6	[3, 3, 5, 5]	256	8 × 8
Output	MLP (256 hidden, dropout 0.3)	10	1 × 1

Training Configuration

Component	Setting
Optimiser	SGD with Nesterov momentum (μ = 0.9)
Weight decay	5×10⁻⁴ (conv and linear weights only; excludes BN params and biases)
Base learning rate	0.1
LR schedule	Linear warmup (5 epochs) → cosine annealing to 1×10⁻⁵ over 195 epochs
Batch size	128
Epochs	200
Loss	Cross-entropy with label smoothing (ε = 0.1)
MixUp	α = 0.2
Augmentation	RandomCrop (32×32, pad 4, reflect) · RandomHorizontalFlip · RandAugment (n=2, m=9) · RandomErasing (p=0.25)
Precision	Automatic mixed precision (fp16 forward, fp32 master weights)
Initialisation	Kaiming normal (fan-out, ReLU)

Results

Metric	Value
Best test accuracy	96.20% (epoch 199)
Marking bracket	≥ 92% (top bracket)

The training loss curve shows steady descent over ~78,200 batches, with high per-batch variance driven by MixUp's U-shaped mixing coefficients and RandomErasing. Both training and test accuracy curves rise together throughout, with a ~4 percentage point generalisation gap at epoch 199 — consistent with a well-regularised model trained from scratch.

Part 2 — Beyond CIFAR-10: Long-Tail Image Recognition

Motivation and Dataset

We study long-tail image recognition — a recognised challenge in computer vision where most classes have very few training examples and a small number of head classes dominate the distribution.

We use iNaturalist 2018 (Van Horn et al., 2018), a real biodiversity dataset with naturally occurring class imbalance, rather than a synthetic benchmark. The task is species classification across 8,142 classes.

Dataset property	Value
Total images	461,939
Number of classes	8,142
Min samples per class	3
Max samples per class	602
Imbalance ratio	200.67:1
Mean samples per class	34
Median samples per class	15

Head classes (top 20%, 1,628 classes) average 115 training samples; tail classes (bottom 20%, 1,628 classes) average only 8.77.

Model and Protocol

Backbone: ImageNet-pretrained ResNet-50 (IMAGENET1K_V2 weights) with the classifier replaced by an 8,142-way linear head (40.19M parameters, all updated end-to-end except during cRT Stage 2).

Data split: 60/20/20 stratified train/val/test split (277,163 / 92,388 / 92,388 images). Stratification is essential — without it, many tail classes would be absent from one or more splits.

Evaluation metric: Balanced accuracy (primary), not overall accuracy — overall accuracy is dominated by head classes and masks poor tail performance.

Each method was repeated with seeds 42, 123, and 456; all reported numbers are means and standard deviations across seeds.

Methods Compared

Five methods spanning the principal solution categories in the long-tail literature:

Method	Description
Baseline CE	Standard cross-entropy on the natural distribution — no rebalancing
Re-weighting	Loss weighted by inverse class frequency (1/nʸ)
Resampling	WeightedRandomSampler oversampling rare classes by 1/nʸ at the data level
Logit Adjustment	Baseline training; at inference logits adjusted by subtracting τ·log(πʸ), τ=1.0
Two-Stage cRT	Stage 1: full network on natural distribution (80 epochs); Stage 2: backbone frozen, classifier retrained with balanced sampling (20 epochs)

Results

Method	Overall	Balanced	Macro-F1	Head	Medium	Tail
Baseline CE	65.39 ± 0.19	54.33 ± 0.36	61.82 ± 0.24	64.30	53.86	45.74
Re-weighting	57.15 ± 0.16	51.03 ± 0.17	57.18 ± 0.16	58.44	50.70	44.60
Resampling	59.34 ± 0.19	53.01 ± 0.16	58.71 ± 0.18	60.17	52.54	47.26
Logit Adjustment	61.66 ± 0.09	58.72 ± 0.18	57.13 ± 0.03	60.51	58.97	56.19
Two-Stage cRT	61.67 ± 0.20	57.32 ± 0.08	57.62 ± 0.19	61.50	57.56	52.43

Values are test-set percentages (mean ± std across 3 seeds). Bold = best by balanced accuracy.

Key Findings

Overall accuracy is a misleading metric on long-tailed data. The baseline tops overall accuracy (65.39%) but achieves only 45.74% on tail classes — a 19.6 pp gap that is invisible if you only report the headline number.
Inference-time correction outperforms training-time rebalancing. Logit adjustment achieves the best balanced accuracy (58.72%) and tail accuracy (56.19%) without any change to training. Re-weighting and resampling both underperform the baseline on balanced accuracy at this scale of imbalance (~200:1), consistent with Buda et al. (2018).
Re-weighting and resampling are near-equivalent, as theory predicts. The tail accuracy gap between the two is 2.66 pp (47.26% vs 44.60%), confirming near-equivalence of applying gradient pressure via loss weights vs sampling frequency (Buda et al., 2018).

How to Run

Both notebooks are designed to run on Google Colab. Set the runtime to GPU before starting: Runtime > Change runtime type > T4 GPU.

Part 1 — CIFAR-10 Classification

Run all cells in order. The CIFAR-10 dataset is downloaded automatically via torchvision.

Part 2 — Long-Tail Recognition (iNaturalist 2018)

The notebook is split into three sections:

Section	Purpose	Run on Colab?
Section 0 — Shared Setup	Imports, device detection, hyperparameters, model architecture, shared utilities	✅ Run first
Section 1 — Training	Full local training pipeline that produced the saved checkpoints and logs	⏭ Skip on Colab
Section 2 — Colab Evaluation	Downloads checkpoints and logs from HuggingFace, recreates all plots and tables, validates hypotheses	✅ Run after Section 0

To reproduce all results on Colab (no dataset download required):

Run all Section 0 cells
Skip Section 1 entirely
Run Section 2 cells top to bottom — checkpoints and training logs are downloaded automatically from HuggingFace (~9 GB)

Model weights: checkpoints.zip (~9 GB) hosted at HuggingFace:
https://huggingface.co/datasets/husaam7/inat2018-checkpoints

Tech Stack

PyTorch — model implementation and training
torchvision — CIFAR-10 DataLoaders, ResNet-50 pretrained weights, data augmentation
Hugging Face / iNaturalist 2018 — dataset for Part 2
Matplotlib — training curves and result plots
Google Colab — training environment (GPU)

References

Buda, M., Maki, A. and Mazurowski, M.A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, pp. 249–259.
Cubuk, E.D., Zoph, B., Shlens, J. and Le, Q.V. (2020). RandAugment: Practical automated data augmentation with a reduced search space. NeurIPS, 33, pp. 18613–18624.
Cui, Y. et al. (2019). Class-balanced loss based on effective number of samples. CVPR, pp. 9268–9277.
Kang, B. et al. (2020). Decoupling representation and classifier for long-tailed recognition. ICLR.
Loshchilov, I. and Hutter, F. (2017). SGDR: Stochastic gradient descent with warm restarts. ICLR.
Menon, A.K. et al. (2021). Long-tail learning via logit adjustment. ICLR.
Szegedy, C. et al. (2016). Rethinking the Inception architecture for computer vision. CVPR, pp. 2818–2826.
Van Horn, G. et al. (2018). The iNaturalist species classification and detection dataset. CVPR, pp. 8769–8778.
Zhang, H. et al. (2018). MixUp: Beyond empirical risk minimisation. ICLR.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitignore		.gitignore
Beyond_CIFAR_10_image_classification.ipynb		Beyond_CIFAR_10_image_classification.ipynb
CIFAR-10_image_classification.ipynb		CIFAR-10_image_classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Networks and Deep Learning

Overview

Repository Structure

Part 1 — CIFAR-10 Image Classification

Architecture: CIFARNet

Training Configuration

Results

Part 2 — Beyond CIFAR-10: Long-Tail Image Recognition

Motivation and Dataset

Model and Protocol

Methods Compared

Results

Key Findings

How to Run

Part 1 — CIFAR-10 Classification

Part 2 — Long-Tail Recognition (iNaturalist 2018)

Tech Stack

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Neural Networks and Deep Learning

Overview

Repository Structure

Part 1 — CIFAR-10 Image Classification

Architecture: CIFARNet

Training Configuration

Results

Part 2 — Beyond CIFAR-10: Long-Tail Image Recognition

Motivation and Dataset

Model and Protocol

Methods Compared

Results

Key Findings

How to Run

Part 1 — CIFAR-10 Classification

Part 2 — Long-Tail Recognition (iNaturalist 2018)

Tech Stack

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages