This project explores various image segmentation techniques on the Oxford-IIIT Pet Dataset. The goal is to implement and compare different segmentation approaches, ranging from traditional methods to deep learning models, while investigating how pre-training and feature extraction strategies impact segmentation performance.
- 📂 config: Configuration parameters
- 📂 data: Dataset loaders and augmentation utilities
- 📂 scripts: Scripts for preprocessing and running experiments
- 📂 utils: Utility functions for dataset processing
- 📂 Dataset: Original dataset
- 📂 Processed_Dataset: Standardized dataset (generated via scripts)
- 📂 Augmented_Dataset: Dataset with augmentations (generated via scripts)
This project requires several dependencies that can be installed using the provided environment file. Follow these steps to set up the environment:
conda env create -f environment.yml
conda activate cv-segFor consistent results, all data preprocessing includes a fixed random seed (RANDOM_SEED=42) to ensure:
- Train/validation splits are consistent
- Data augmentations are reproducible
- Models initialized with the same weights
- All generated datasets are created within the repository directory structure.
Prepare the Oxford-IIIT Pet Dataset for image segmentation by:
- Standardizing dimensions.
- Implementing data augmentation techniques to improve model training performance.
The original dataset contains images of cats and dogs with corresponding segmentation masks. The masks include:
- Class 0: Background pixels.
- Class 1: Cat pixels.
- Class 2: Dog pixels
- Class 3: Boundary/outline pixels.
We standardized the dataset using the following steps:
- Resize Images: All images and masks are resized to
256×256pixels. - Train/Val/Test Split: The dataset is split into:
- Train (80%)
- Validation (20%)
- Test sets.
- Mask Processing: Nearest-neighbor interpolation is used to preserve label information during resizing.
The implementation is available in preprocessing.py, which provides:
standardize_dataset(): Handles image/mask resizing and splitting.get_train_val_split(): Creates training/validation splits using scikit-learn.
To improve model generalization, we implemented augmentation techniques using the Albumentations library:
-
Geometric Transformations (applied to both images and masks):
- Random flips (horizontal and vertical).
- Random 90° rotations.
- Elastic transformations.
- Grid distortions.
- Random cropping.
-
Pixel-Level Transformations (applied to images only):
- Brightness/contrast adjustments.
- Gaussian noise addition.
For each training image, 3 augmented variants are generated, effectively quadrupling the training set size.
The augmentation functionality is provided in augmentation.py:
get_training_augmentation(): Returns the augmentation pipeline for training data.get_validation_augmentation(): Provides minimal processing for validation data.
To preprocess the dataset, run the following script:
python scripts/prepare_dataset.pyThis pipeline:
- Creates a standardized dataset with consistent dimensions.
- Generates augmented versions of all training images.
- Displays dataset statistics.
For model training, we provide the PetSegmentationDataset class, which integrates seamlessly with PyTorch for efficient data loading.