A beginner-friendly image classifier that learns to tell hotdogs apart from everything else. Built with PyTorch and based on the classic Silicon Valley "Not Hotdog" app concept.
- What This Project Does (Big Picture)
- How It Works (The Key Concepts)
- Prerequisites
- Getting Started
- Project Structure
- Usage
- Web Interface
- Understanding the Code
- CLI Reference
- Frequently Asked Questions
- Troubleshooting
This project trains a neural network to look at a photo and answer one question: "Is this a hotdog, or not?"
Instead of building a neural network from scratch (which would need millions of images and days of compute), we use a technique called transfer learning: we take a model that already knows how to recognise thousands of objects (ResNet-18, trained on ImageNet) and teach it one new trick — spotting hotdogs.
If you're new to machine learning, here's a plain-English summary of the main ideas used in this codebase:
Training a deep neural network from scratch requires huge datasets and lots of GPU time. Transfer learning sidesteps this by starting with a model that was already trained on a large dataset (ImageNet — 1.2 million images, 1 000 classes). We keep all of that learned knowledge (the "frozen" layers), chop off the final classification layer, and replace it with our own tiny layer that outputs just two classes: hotdog and not hotdog. Only that final layer gets trained.
ResNet-18 is a convolutional neural network with 18 layers. It was designed by Microsoft Research and is one of the most popular starter architectures for image classification. The "Res" stands for residual — the network uses shortcut connections that let gradients flow more easily during training, making deeper networks practical.
Food-101 is a publicly available dataset of 101 food categories with 101 000 images total (1 000 per category). We download it automatically via torchvision.datasets.Food101, then filter it down to just two categories (hotdog vs. a balanced random sample of everything else).
During training, each image is randomly cropped and flipped (RandomResizedCrop, RandomHorizontalFlip). This forces the model to learn that a hotdog is a hotdog regardless of position, zoom, or orientation — and it effectively multiplies the size of the training set.
We only train the final fully-connected layer (model.fc). All earlier layers are "frozen" — their weights don't change. This keeps training fast (minutes, not hours) and avoids overfitting on our relatively small dataset.
This is the standard loss function for classification tasks. It measures how far the model's predicted probabilities are from the true labels. Lower is better.
Adam (Adaptive Moment Estimation) is an optimizer that adjusts the learning rate for each parameter individually. It's a good default choice and usually converges faster than plain SGD. We pair it with a cosine annealing learning rate scheduler that gradually reduces the learning rate over the course of training, which helps the model converge more smoothly — especially on longer runs.
| Requirement | Why |
|---|---|
| Python 3.12+ | Specified in pyproject.toml. Earlier versions may work but are untested. |
| uv | A fast Python package manager used to manage the virtual environment and dependencies. |
| ~5 GB free disk space | The Food-101 dataset downloads to ./data/ on first run. |
| A GPU (optional) | Training will use a CUDA GPU if available, otherwise it falls back to CPU. CPU training works but is slower. |
If you don't have uv yet:
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# or via pip (if you already have Python)
pip install uvVerify it's installed:
uv --versionFollow these steps exactly and you'll be up and running:
git clone <repo-url>
cd neuralnetuv syncThis reads pyproject.toml, creates a virtual environment in .venv/, and installs:
- torch — the PyTorch deep learning framework
- torchvision — image datasets, model architectures, and transforms
- Pillow — image loading library (used under the hood by torchvision)
- Flask — lightweight web framework for the drag-and-drop web UI
You do not need to activate the virtual environment manually. The uv run command handles that for you.
uv run neuralnet(You can also run uv run python -m hotdog — both work identically.)
What happens when you run this:
- The Food-101 dataset downloads to
./data/(~5 GB, only on the first run). - The dataset is filtered to hotdog vs. not-hotdog images (balanced 50/50 split).
- A pretrained ResNet-18 is loaded, and its last layer is replaced with a 2-class head.
- The model trains for 5 epochs, printing loss and accuracy after each one.
- The trained weights are saved to
hotdog_model.pthin the project root.
You should see output like:
Using device: cuda
Loading Food-101 dataset (will download on first run)…
Training samples: 1500 | Test samples: 500
Epoch 1/5 Train loss=0.4532 acc=0.812 Test loss=0.2145 acc=0.924
Epoch 2/5 Train loss=0.2301 acc=0.921 Test loss=0.1687 acc=0.942
...
Model saved to /home/you/code/neuralnet/hotdog_model.pth
(Exact numbers will vary depending on your hardware and random seeds.)
Once training is done, point the model at any image:
uv run neuralnet --predict path/to/your/image.jpgOutput:
HOTDOG 🌭 (confidence: 97.3%)
or
NOT hotdog (confidence: 89.1%)
neuralnet/
├── hotdog/ # Python package — all application code
│ ├── __init__.py # Package metadata (version string)
│ ├── __main__.py # Enables `python -m hotdog`
│ ├── cli.py # Argument parsing & dispatch
│ ├── data.py # BinarySubset class & Food-101 filtering
│ ├── engine.py # Training loop, evaluation, orchestration
│ ├── model.py # ResNet-18 model construction
│ ├── predict.py # Single-image prediction
│ ├── transforms.py # Image transforms & normalisation constants
│ ├── web.py # Flask web server & /predict API endpoint
│ └── templates/
│ └── index.html # Drag-and-drop web UI (HTML/CSS/JS)
├── .gitignore # Keeps data/, .venv/, .pth, dist/, build artifacts out of git
├── .venv/ # Virtual environment (created by `uv sync`, do NOT commit)
├── data/ # Food-101 dataset (downloaded at runtime, do NOT commit)
├── hotdog_model.pth # Saved model weights (created after training)
├── pyproject.toml # Project metadata, dependencies & tool config
├── uv.lock # Lock file pinning exact dependency versions
├── neuralnet.code-workspace # VS Code workspace file
├── CODE_REVIEW.md # Code review notes
└── README.md # You are here
Each module has a single responsibility:
| Module | Responsibility |
|---|---|
transforms.py |
ImageNet normalisation constants, training & eval transforms |
model.py |
Building and configuring the ResNet-18 binary classifier |
data.py |
BinarySubset wrapper and Food-101 → hotdog/not-hotdog filtering |
engine.py |
train_one_epoch(), evaluate(), and the top-level train() orchestrator |
predict.py |
Loading a saved model and classifying a single image |
cli.py |
argparse-based CLI, validation, and dispatch to train(), predict(), or web |
web.py |
Flask app factory, /predict API endpoint, model loading |
templates/index.html |
Drag-and-drop frontend with result overlay |
All commands assume you are in the neuralnet/ project root.
uv run neuralnetuv run neuralnet --epochs 10 --batch-size 64 --lr 0.0005uv run neuralnet --data-dir /tmp/food101uv run neuralnet --seed 123uv run neuralnet --model-path ./models/v2.pthuv run neuralnet --predict my_photo.jpgImportant: You must train the model first (so
hotdog_model.pthexists) before running--predict. The script will print a clear error message if the model file is missing.
uv run neuralnet --webSee the Web Interface section below for details.
The project includes a browser-based drag-and-drop UI for classifying images interactively.
uv run neuralnet --webThen open http://localhost:5000 in your browser.
To use a different port:
uv run neuralnet --web --port 8080- Drag and drop an image onto the drop zone (or click to browse).
- Press the Evaluate button.
- The result overlays on the image:
- Green "Hotdog" — the model thinks it's a hotdog.
- Red "Not Hotdog" — the model thinks it's something else.
- A confidence percentage is shown below the label.
- Drop a new image to reset and classify again.
The web server loads the model once at startup and keeps it in memory, so predictions are fast.
Note: You must train the model first (
uv run neuralnet) so thathotdog_model.pthexists before launching the web UI.
The code lives in the hotdog/ package. Each module is described below — open the files side-by-side as you read.
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]These are the per-channel mean and standard deviation of the ImageNet dataset. They're used to normalise input images so they match what the pretrained ResNet expects. Defined once as constants to avoid duplication.
TRAIN_TRANSFORM and EVAL_TRANSFORM convert raw images into the exact format the neural network expects:
RandomResizedCrop(224)— During training, randomly crop and resize to 224×224 pixels. This is data augmentation.RandomHorizontalFlip()— 50% chance of flipping the image left-to-right. More augmentation.Resize(256)+CenterCrop(224)— During evaluation, deterministically resize and crop. No randomness, so results are reproducible.ToTensor()— Convert the PIL image to a PyTorch tensor (pixel values go from 0–255 integers to 0.0–1.0 floats).Normalize(IMAGENET_MEAN, IMAGENET_STD)— Subtract the ImageNet mean and divide by the ImageNet standard deviation for each colour channel (R, G, B). This is required because the pretrained ResNet was trained with this normalisation.
weights = models.ResNet18_Weights.DEFAULT if pretrained else None
model = models.resnet18(weights=weights)
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, 2)Step by step:
- Load ResNet-18 — with pretrained ImageNet weights during training, or without weights during prediction (since we'll load our own saved weights anyway, downloading pretrained weights would be wasteful).
- Freeze every parameter (
requires_grad = False) — we don't want to change the feature extraction layers. - Replace the final fully-connected layer (
model.fc) with a new one that outputs 2 classes instead of 1 000.
The new model.fc layer is not frozen (its requires_grad defaults to True), so it will be the only part of the network that learns.
BinarySubset is a Subset subclass that relabels items to binary (1 = hotdog, 0 = not hotdog). Defined at module level so it can be pickled — important for multi-process data loading.
hotdog_binary_subset() takes the full Food-101 dataset (101 classes) and returns a balanced BinarySubset:
- Label 1 = hotdog
- Label 0 = not hotdog (random sample of other foods, same count as hotdogs)
It uses dataset.targets (the public API) to read labels efficiently without loading any images. Balancing is important — without it, the model could get 99% accuracy by always predicting "not hotdog" since hotdogs are only 1 out of 101 classes.
The seed parameter controls the random selection of non-hotdog samples, ensuring reproducible splits.
train_one_epoch() — the standard PyTorch training loop:
- Set model to training mode (
model.train()). - For each batch: forward pass → compute loss → backward pass → optimizer step.
- Track running loss and accuracy.
evaluate() — same as training, but:
- Wrapped in
@torch.no_grad()— no need to compute gradients. - Model is in eval mode (
model.eval()) — disables dropout and uses running stats for batch normalisation. - No optimizer step — we're only measuring performance.
train() — orchestrates the full training run and returns a dict of final metrics (train_loss, train_acc, test_loss, test_acc):
- Set random seeds for reproducibility. The test subset uses a decorrelated seed (
seed + 1) to avoid accidental overlap with the training subset. - Pick device (GPU if available, else CPU).
- Download / load Food-101.
- Create balanced binary subsets for train and test.
- Wrap them in
DataLoaders (handles batching, shuffling, parallel loading). Worker count is auto-detected based on your CPU cores, capped at 4. - Build the model (with pretrained weights), loss function, optimizer, and cosine annealing scheduler.
- Loop over epochs, stepping the scheduler each time.
- Create parent directories for the model path if needed, then save trained weights.
- Return the final-epoch metrics as a dictionary for programmatic use.
Validates that both the image file and model file exist (error messages go to stderr so they don't pollute piped output). Gracefully handles corrupt or unreadable images with a try/except around Image.open(). Then loads the saved model (with pretrained=False to skip downloading ImageNet weights), preprocesses the image, runs it through the network, and prints whether it's a hotdog along with the confidence score.
Parses command-line arguments with argparse, validates inputs (epochs >= 1, batch size >= 1, lr > 0), and dispatches to either train() or predict(). Heavy imports (torch, model code) are deferred to the branch that needs them, keeping --help fast.
| Flag | Type | Default | Description |
|---|---|---|---|
--epochs |
int | 5 |
Number of training epochs (must be >= 1) |
--batch-size |
int | 32 |
Images per training batch (must be >= 1) |
--lr |
float | 0.001 |
Learning rate for the Adam optimizer (must be > 0) |
--data-dir |
path | <project>/data |
Where to download/store the Food-101 dataset |
--seed |
int | 42 |
Random seed for reproducibility |
--model-path |
path | <project>/hotdog_model.pth |
Path to save/load model weights |
--predict |
str | none | Path to an image file — if set, skips training and classifies this image |
--web |
flag | off | Launch the drag-and-drop web UI instead of training |
--port |
int | 5000 |
Port for the web server (only used with --web) |
No. The model will train on CPU — it just takes longer (perhaps 5–15 minutes instead of 1–2 minutes on a GPU). This is feasible because we're only training the final layer.
With default settings (5 epochs), expect ~90–95% test accuracy. You can push this higher by:
- Training for more epochs (
--epochs 15). - Unfreezing more layers (edit
build_model()to only freeze earlier blocks). - Using a larger backbone (e.g.,
resnet50instead ofresnet18).
Yes! After training, use --predict to classify any JPEG/PNG image. For training on a custom dataset, you'd replace the datasets.Food101 calls with datasets.ImageFolder pointing at a directory structure like:
my_data/
├── train/
│ ├── hotdog/
│ │ ├── img001.jpg
│ │ └── ...
│ └── not_hotdog/
│ ├── img001.jpg
│ └── ...
└── test/
├── hotdog/
└── not_hotdog/
It's a file containing the trained weights (parameters) of the neural network. It's saved by torch.save() after training and loaded by torch.load() during prediction. If you delete it, you'll need to retrain.
ResNet-18 is small, fast, and good enough for a binary classification task. It's ideal for learning because training completes quickly and you can experiment with different settings without waiting a long time.
It sets the random seed for PyTorch (CPU and GPU), the dataset balancing shuffle, and other random operations. Using the same seed produces the same train/test split and weight initialisation each time, making experiments reproducible.
You need to train the model before predicting. Run uv run neuralnet first.
The image path you passed to --predict doesn't exist. Double-check the path.
Your GPU doesn't have enough memory for the batch size. Try reducing it:
uv run neuralnet --batch-size 16Food-101 is ~5 GB. If the download stalls:
- Delete the partial
./data/directory. - Check your internet connection.
- Try again:
uv run neuralnet
If it keeps failing, you can download manually from https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/ and extract into ./data/food-101/.
You probably ran python -m hotdog instead of uv run neuralnet (or uv run python -m hotdog). The dependencies are installed inside the virtual environment that uv manages. Always use uv run.
The model is randomly guessing. This can happen if:
- The learning rate is too high or too low — try
--lr 0.0001or--lr 0.01. - Something went wrong with data loading — check that
./data/contains the Food-101 files.
Happy hotdog hunting!