Skip to content

Kcuga/CV_Segmentation

Repository files navigation

Computer Vision Image Segmentation Project

Overview

This project explores various image segmentation techniques on the Oxford-IIIT Pet Dataset. The goal is to implement and compare different segmentation approaches, ranging from traditional methods to deep learning models, while investigating how pre-training and feature extraction strategies impact segmentation performance.


Project Structure

  • 📂 config: Configuration parameters
  • 📂 data: Dataset loaders and augmentation utilities
  • 📂 scripts: Scripts for preprocessing and running experiments
  • 📂 utils: Utility functions for dataset processing
  • 📂 Dataset: Original dataset

Generated Directories (not in version control)

  • 📂 Processed_Dataset: Standardized dataset (generated via scripts)
  • 📂 Augmented_Dataset: Dataset with augmentations (generated via scripts)

Environment Setup

This project requires several dependencies that can be installed using the provided environment file. Follow these steps to set up the environment:

conda env create -f environment.yml
conda activate cv-seg

Reproducibility

For consistent results, all data preprocessing includes a fixed random seed (RANDOM_SEED=42) to ensure:

  • Train/validation splits are consistent
  • Data augmentations are reproducible
  • Models initialized with the same weights
  • All generated datasets are created within the repository directory structure.

Task 1: Dataset Preprocessing and Augmentation

Goal

Prepare the Oxford-IIIT Pet Dataset for image segmentation by:

  • Standardizing dimensions.
  • Implementing data augmentation techniques to improve model training performance.

Dataset Overview

The original dataset contains images of cats and dogs with corresponding segmentation masks. The masks include:

  • Class 0: Background pixels.
  • Class 1: Cat pixels.
  • Class 2: Dog pixels
  • Class 3: Boundary/outline pixels.

Implementation

1. Dataset Standardization

We standardized the dataset using the following steps:

  1. Resize Images: All images and masks are resized to 256×256 pixels.
  2. Train/Val/Test Split: The dataset is split into:
    • Train (80%)
    • Validation (20%)
    • Test sets.
  3. Mask Processing: Nearest-neighbor interpolation is used to preserve label information during resizing.

The implementation is available in preprocessing.py, which provides:

  • standardize_dataset(): Handles image/mask resizing and splitting.
  • get_train_val_split(): Creates training/validation splits using scikit-learn.

2. Data Augmentation

To improve model generalization, we implemented augmentation techniques using the Albumentations library:

  • Geometric Transformations (applied to both images and masks):

    • Random flips (horizontal and vertical).
    • Random 90° rotations.
    • Elastic transformations.
    • Grid distortions.
    • Random cropping.
  • Pixel-Level Transformations (applied to images only):

    • Brightness/contrast adjustments.
    • Gaussian noise addition.

For each training image, 3 augmented variants are generated, effectively quadrupling the training set size.

The augmentation functionality is provided in augmentation.py:

  • get_training_augmentation(): Returns the augmentation pipeline for training data.
  • get_validation_augmentation(): Provides minimal processing for validation data.

Usage

To preprocess the dataset, run the following script:

python scripts/prepare_dataset.py

This pipeline:

  1. Creates a standardized dataset with consistent dimensions.
  2. Generates augmented versions of all training images.
  3. Displays dataset statistics.

Data Loading

For model training, we provide the PetSegmentationDataset class, which integrates seamlessly with PyTorch for efficient data loading.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors