Skip to content

nkwus/neuralnet

Repository files navigation

Hotdog / Not-Hotdog Classifier

A beginner-friendly image classifier that learns to tell hotdogs apart from everything else. Built with PyTorch and based on the classic Silicon Valley "Not Hotdog" app concept.


Table of Contents

  1. What This Project Does (Big Picture)
  2. How It Works (The Key Concepts)
  3. Prerequisites
  4. Getting Started
  5. Project Structure
  6. Usage
  7. Web Interface
  8. Understanding the Code
  9. CLI Reference
  10. Frequently Asked Questions
  11. Troubleshooting

What This Project Does (Big Picture)

This project trains a neural network to look at a photo and answer one question: "Is this a hotdog, or not?"

Instead of building a neural network from scratch (which would need millions of images and days of compute), we use a technique called transfer learning: we take a model that already knows how to recognise thousands of objects (ResNet-18, trained on ImageNet) and teach it one new trick — spotting hotdogs.


How It Works (The Key Concepts)

If you're new to machine learning, here's a plain-English summary of the main ideas used in this codebase:

Transfer Learning

Training a deep neural network from scratch requires huge datasets and lots of GPU time. Transfer learning sidesteps this by starting with a model that was already trained on a large dataset (ImageNet — 1.2 million images, 1 000 classes). We keep all of that learned knowledge (the "frozen" layers), chop off the final classification layer, and replace it with our own tiny layer that outputs just two classes: hotdog and not hotdog. Only that final layer gets trained.

ResNet-18

ResNet-18 is a convolutional neural network with 18 layers. It was designed by Microsoft Research and is one of the most popular starter architectures for image classification. The "Res" stands for residual — the network uses shortcut connections that let gradients flow more easily during training, making deeper networks practical.

Food-101 Dataset

Food-101 is a publicly available dataset of 101 food categories with 101 000 images total (1 000 per category). We download it automatically via torchvision.datasets.Food101, then filter it down to just two categories (hotdog vs. a balanced random sample of everything else).

Data Augmentation

During training, each image is randomly cropped and flipped (RandomResizedCrop, RandomHorizontalFlip). This forces the model to learn that a hotdog is a hotdog regardless of position, zoom, or orientation — and it effectively multiplies the size of the training set.

Fine-Tuning

We only train the final fully-connected layer (model.fc). All earlier layers are "frozen" — their weights don't change. This keeps training fast (minutes, not hours) and avoids overfitting on our relatively small dataset.

Cross-Entropy Loss

This is the standard loss function for classification tasks. It measures how far the model's predicted probabilities are from the true labels. Lower is better.

Adam Optimizer with Cosine Annealing

Adam (Adaptive Moment Estimation) is an optimizer that adjusts the learning rate for each parameter individually. It's a good default choice and usually converges faster than plain SGD. We pair it with a cosine annealing learning rate scheduler that gradually reduces the learning rate over the course of training, which helps the model converge more smoothly — especially on longer runs.


Prerequisites

Requirement Why
Python 3.12+ Specified in pyproject.toml. Earlier versions may work but are untested.
uv A fast Python package manager used to manage the virtual environment and dependencies.
~5 GB free disk space The Food-101 dataset downloads to ./data/ on first run.
A GPU (optional) Training will use a CUDA GPU if available, otherwise it falls back to CPU. CPU training works but is slower.

Installing uv

If you don't have uv yet:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# or via pip (if you already have Python)
pip install uv

Verify it's installed:

uv --version

Getting Started

Follow these steps exactly and you'll be up and running:

1. Clone the repository

git clone <repo-url>
cd neuralnet

2. Install dependencies

uv sync

This reads pyproject.toml, creates a virtual environment in .venv/, and installs:

  • torch — the PyTorch deep learning framework
  • torchvision — image datasets, model architectures, and transforms
  • Pillow — image loading library (used under the hood by torchvision)
  • Flask — lightweight web framework for the drag-and-drop web UI

You do not need to activate the virtual environment manually. The uv run command handles that for you.

3. Train the model

uv run neuralnet

(You can also run uv run python -m hotdog — both work identically.)

What happens when you run this:

  1. The Food-101 dataset downloads to ./data/ (~5 GB, only on the first run).
  2. The dataset is filtered to hotdog vs. not-hotdog images (balanced 50/50 split).
  3. A pretrained ResNet-18 is loaded, and its last layer is replaced with a 2-class head.
  4. The model trains for 5 epochs, printing loss and accuracy after each one.
  5. The trained weights are saved to hotdog_model.pth in the project root.

You should see output like:

Using device: cuda
Loading Food-101 dataset (will download on first run)…
Training samples: 1500  |  Test samples: 500
Epoch 1/5  Train loss=0.4532 acc=0.812  Test  loss=0.2145 acc=0.924
Epoch 2/5  Train loss=0.2301 acc=0.921  Test  loss=0.1687 acc=0.942
...
Model saved to /home/you/code/neuralnet/hotdog_model.pth

(Exact numbers will vary depending on your hardware and random seeds.)

4. Classify an image

Once training is done, point the model at any image:

uv run neuralnet --predict path/to/your/image.jpg

Output:

HOTDOG 🌭  (confidence: 97.3%)

or

NOT hotdog  (confidence: 89.1%)

Project Structure

neuralnet/
├── hotdog/                   # Python package — all application code
│   ├── __init__.py           # Package metadata (version string)
│   ├── __main__.py           # Enables `python -m hotdog`
│   ├── cli.py                # Argument parsing & dispatch
│   ├── data.py               # BinarySubset class & Food-101 filtering
│   ├── engine.py             # Training loop, evaluation, orchestration
│   ├── model.py              # ResNet-18 model construction
│   ├── predict.py            # Single-image prediction
│   ├── transforms.py         # Image transforms & normalisation constants
│   ├── web.py                # Flask web server & /predict API endpoint
│   └── templates/
│       └── index.html        # Drag-and-drop web UI (HTML/CSS/JS)
├── .gitignore                # Keeps data/, .venv/, .pth, dist/, build artifacts out of git
├── .venv/                    # Virtual environment (created by `uv sync`, do NOT commit)
├── data/                     # Food-101 dataset (downloaded at runtime, do NOT commit)
├── hotdog_model.pth          # Saved model weights (created after training)
├── pyproject.toml            # Project metadata, dependencies & tool config
├── uv.lock                   # Lock file pinning exact dependency versions
├── neuralnet.code-workspace  # VS Code workspace file
├── CODE_REVIEW.md            # Code review notes
└── README.md                 # You are here

Each module has a single responsibility:

Module Responsibility
transforms.py ImageNet normalisation constants, training & eval transforms
model.py Building and configuring the ResNet-18 binary classifier
data.py BinarySubset wrapper and Food-101 → hotdog/not-hotdog filtering
engine.py train_one_epoch(), evaluate(), and the top-level train() orchestrator
predict.py Loading a saved model and classifying a single image
cli.py argparse-based CLI, validation, and dispatch to train(), predict(), or web
web.py Flask app factory, /predict API endpoint, model loading
templates/index.html Drag-and-drop frontend with result overlay

Usage

All commands assume you are in the neuralnet/ project root.

Train with default settings (5 epochs)

uv run neuralnet

Train with custom settings

uv run neuralnet --epochs 10 --batch-size 64 --lr 0.0005

Use a different data directory

uv run neuralnet --data-dir /tmp/food101

Set a reproducibility seed

uv run neuralnet --seed 123

Save the model to a specific path

uv run neuralnet --model-path ./models/v2.pth

Predict on a single image

uv run neuralnet --predict my_photo.jpg

Important: You must train the model first (so hotdog_model.pth exists) before running --predict. The script will print a clear error message if the model file is missing.

Launch the web interface

uv run neuralnet --web

See the Web Interface section below for details.


Web Interface

The project includes a browser-based drag-and-drop UI for classifying images interactively.

Starting the server

uv run neuralnet --web

Then open http://localhost:5000 in your browser.

To use a different port:

uv run neuralnet --web --port 8080

How it works

  1. Drag and drop an image onto the drop zone (or click to browse).
  2. Press the Evaluate button.
  3. The result overlays on the image:
    • Green "Hotdog" — the model thinks it's a hotdog.
    • Red "Not Hotdog" — the model thinks it's something else.
    • A confidence percentage is shown below the label.
  4. Drop a new image to reset and classify again.

The web server loads the model once at startup and keeps it in memory, so predictions are fast.

Note: You must train the model first (uv run neuralnet) so that hotdog_model.pth exists before launching the web UI.


Understanding the Code

The code lives in the hotdog/ package. Each module is described below — open the files side-by-side as you read.

transforms.py — Image Constants & Transforms

IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD  = [0.229, 0.224, 0.225]

These are the per-channel mean and standard deviation of the ImageNet dataset. They're used to normalise input images so they match what the pretrained ResNet expects. Defined once as constants to avoid duplication.

TRAIN_TRANSFORM and EVAL_TRANSFORM convert raw images into the exact format the neural network expects:

  • RandomResizedCrop(224) — During training, randomly crop and resize to 224×224 pixels. This is data augmentation.
  • RandomHorizontalFlip() — 50% chance of flipping the image left-to-right. More augmentation.
  • Resize(256) + CenterCrop(224) — During evaluation, deterministically resize and crop. No randomness, so results are reproducible.
  • ToTensor() — Convert the PIL image to a PyTorch tensor (pixel values go from 0–255 integers to 0.0–1.0 floats).
  • Normalize(IMAGENET_MEAN, IMAGENET_STD) — Subtract the ImageNet mean and divide by the ImageNet standard deviation for each colour channel (R, G, B). This is required because the pretrained ResNet was trained with this normalisation.

model.pybuild_model(pretrained=True)

weights = models.ResNet18_Weights.DEFAULT if pretrained else None
model = models.resnet18(weights=weights)
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, 2)

Step by step:

  1. Load ResNet-18 — with pretrained ImageNet weights during training, or without weights during prediction (since we'll load our own saved weights anyway, downloading pretrained weights would be wasteful).
  2. Freeze every parameter (requires_grad = False) — we don't want to change the feature extraction layers.
  3. Replace the final fully-connected layer (model.fc) with a new one that outputs 2 classes instead of 1 000.

The new model.fc layer is not frozen (its requires_grad defaults to True), so it will be the only part of the network that learns.

data.pyBinarySubset & hotdog_binary_subset()

BinarySubset is a Subset subclass that relabels items to binary (1 = hotdog, 0 = not hotdog). Defined at module level so it can be pickled — important for multi-process data loading.

hotdog_binary_subset() takes the full Food-101 dataset (101 classes) and returns a balanced BinarySubset:

  • Label 1 = hotdog
  • Label 0 = not hotdog (random sample of other foods, same count as hotdogs)

It uses dataset.targets (the public API) to read labels efficiently without loading any images. Balancing is important — without it, the model could get 99% accuracy by always predicting "not hotdog" since hotdogs are only 1 out of 101 classes.

The seed parameter controls the random selection of non-hotdog samples, ensuring reproducible splits.

engine.py — Training & Evaluation

train_one_epoch() — the standard PyTorch training loop:

  1. Set model to training mode (model.train()).
  2. For each batch: forward pass → compute loss → backward pass → optimizer step.
  3. Track running loss and accuracy.

evaluate() — same as training, but:

  • Wrapped in @torch.no_grad() — no need to compute gradients.
  • Model is in eval mode (model.eval()) — disables dropout and uses running stats for batch normalisation.
  • No optimizer step — we're only measuring performance.

train() — orchestrates the full training run and returns a dict of final metrics (train_loss, train_acc, test_loss, test_acc):

  1. Set random seeds for reproducibility. The test subset uses a decorrelated seed (seed + 1) to avoid accidental overlap with the training subset.
  2. Pick device (GPU if available, else CPU).
  3. Download / load Food-101.
  4. Create balanced binary subsets for train and test.
  5. Wrap them in DataLoaders (handles batching, shuffling, parallel loading). Worker count is auto-detected based on your CPU cores, capped at 4.
  6. Build the model (with pretrained weights), loss function, optimizer, and cosine annealing scheduler.
  7. Loop over epochs, stepping the scheduler each time.
  8. Create parent directories for the model path if needed, then save trained weights.
  9. Return the final-epoch metrics as a dictionary for programmatic use.

predict.py — Single-Image Prediction

Validates that both the image file and model file exist (error messages go to stderr so they don't pollute piped output). Gracefully handles corrupt or unreadable images with a try/except around Image.open(). Then loads the saved model (with pretrained=False to skip downloading ImageNet weights), preprocesses the image, runs it through the network, and prints whether it's a hotdog along with the confidence score.

cli.py — Command-Line Interface

Parses command-line arguments with argparse, validates inputs (epochs >= 1, batch size >= 1, lr > 0), and dispatches to either train() or predict(). Heavy imports (torch, model code) are deferred to the branch that needs them, keeping --help fast.


CLI Reference

Flag Type Default Description
--epochs int 5 Number of training epochs (must be >= 1)
--batch-size int 32 Images per training batch (must be >= 1)
--lr float 0.001 Learning rate for the Adam optimizer (must be > 0)
--data-dir path <project>/data Where to download/store the Food-101 dataset
--seed int 42 Random seed for reproducibility
--model-path path <project>/hotdog_model.pth Path to save/load model weights
--predict str none Path to an image file — if set, skips training and classifies this image
--web flag off Launch the drag-and-drop web UI instead of training
--port int 5000 Port for the web server (only used with --web)

Frequently Asked Questions

Do I need a GPU?

No. The model will train on CPU — it just takes longer (perhaps 5–15 minutes instead of 1–2 minutes on a GPU). This is feasible because we're only training the final layer.

How accurate is the model?

With default settings (5 epochs), expect ~90–95% test accuracy. You can push this higher by:

  • Training for more epochs (--epochs 15).
  • Unfreezing more layers (edit build_model() to only freeze earlier blocks).
  • Using a larger backbone (e.g., resnet50 instead of resnet18).

Can I use my own images instead of Food-101?

Yes! After training, use --predict to classify any JPEG/PNG image. For training on a custom dataset, you'd replace the datasets.Food101 calls with datasets.ImageFolder pointing at a directory structure like:

my_data/
├── train/
│   ├── hotdog/
│   │   ├── img001.jpg
│   │   └── ...
│   └── not_hotdog/
│       ├── img001.jpg
│       └── ...
└── test/
    ├── hotdog/
    └── not_hotdog/

What is hotdog_model.pth?

It's a file containing the trained weights (parameters) of the neural network. It's saved by torch.save() after training and loaded by torch.load() during prediction. If you delete it, you'll need to retrain.

Why ResNet-18 and not something bigger?

ResNet-18 is small, fast, and good enough for a binary classification task. It's ideal for learning because training completes quickly and you can experiment with different settings without waiting a long time.

What does the --seed flag do?

It sets the random seed for PyTorch (CPU and GPU), the dataset balancing shuffle, and other random operations. Using the same seed produces the same train/test split and weight initialisation each time, making experiments reproducible.


Troubleshooting

Error: Model file '...' not found. Run training first

You need to train the model before predicting. Run uv run neuralnet first.

Error: File not found: ...

The image path you passed to --predict doesn't exist. Double-check the path.

RuntimeError: CUDA out of memory

Your GPU doesn't have enough memory for the batch size. Try reducing it:

uv run neuralnet --batch-size 16

The dataset download is stuck or fails

Food-101 is ~5 GB. If the download stalls:

  1. Delete the partial ./data/ directory.
  2. Check your internet connection.
  3. Try again: uv run neuralnet

If it keeps failing, you can download manually from https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/ and extract into ./data/food-101/.

ModuleNotFoundError: No module named 'torch'

You probably ran python -m hotdog instead of uv run neuralnet (or uv run python -m hotdog). The dependencies are installed inside the virtual environment that uv manages. Always use uv run.

Training accuracy is stuck at ~50%

The model is randomly guessing. This can happen if:

  • The learning rate is too high or too low — try --lr 0.0001 or --lr 0.01.
  • Something went wrong with data loading — check that ./data/ contains the Food-101 files.

Happy hotdog hunting!

About

NN based classifier

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors