This project provides a pipeline for preprocessing the CHB-MIT Scalp EEG database from PhysioNet. It handles EDF file loading, extraction of ictal and pre-ictal segments, epoching, and feature extraction using covariance-based methods.
- Segment Extraction:
- Ictal: Automatically identifies and extracts seizure segments based on the patient-specific summary files.
- Pre-ictal: Extracts segments preceding seizures with a configurable offset and duration multiplier.
- Epoching: Splits signal segments into non-overlapping fixed-length epochs (default: 5 seconds).
- Feature Extraction: Implements a
CovarianceExtractorthat computes channel-wise covariance matrices and vectorizes them, preserving the Frobenius norm. - Output: Saves processed features and labels (
1for ictal,0for pre-ictal) as compressed.npzfiles for each patient in theout/data/directory.
- Python >= 3.12
- Dependencies:
mne,numpy
You can install the dependencies using pip or a package manager like uv:
pip install mne numpyOr if you are using uv:
uv syncThe project expects the CHB-MIT Scalp EEG database to be organized in its original structure as downloaded from PhysioNet. Point the PATH_ROOT_DATASET in constants.py to this root folder or pass it as a command-line argument.
chb-mit-scalp-eeg-database/
├── RECORDS-WITH-SEIZURES
├── chb01/
│ ├── chb01-summary.txt
│ ├── chb01_01.edf
│ ├── chb01_02.edf
│ └── ...
├── chb02/
│ ├── chb02-summary.txt
│ ├── chb02_01.edf
│ └── ...
└── ...
To run the preprocessing pipeline, execute the main.py script. The script uses command-line arguments to configure the pipeline:
python main.py [options]--path,-p: Root directory of the EEG dataset (default: defined inconstants.py).--offset_seconds,-o: Time gap (in seconds) between the pre-ictal segment end and the seizure onset (default: 300).--multiplier,-m: Factor used to scale the pre-ictal segment duration relative to the seizure length (default: 3).--epoch_duration,-e: Duration of each signal epoch in seconds for feature extraction (default: 5).
python main.py -p "path/to/dataset" -o 60 -m 5 -e 2The script will:
- Read the
RECORDS-WITH-SEIZURESfile from the dataset root. - Process each record by identifying seizure times from summary files.
- Extract ictal and pre-ictal segments.
- Split segments into fixed-length epochs.
- Extract covariance-based features.
- Save the results as compressed
.npzfiles inout/data/<patient_id>.npz.
main.py: Main entry point for the preprocessing pipeline.constants.py: Configuration for dataset paths and channel selections.edf.py: Module for reading EDF files and parsing seizure metadata.signals.py: Utility functions for segment extraction and epoching.train_test_split.py: Utility to load and split patient data into train/test sets.feature_extractor/:base.py: Abstract base class for feature extractors.covariance.py: Implementation of covariance-based feature extraction.
out/: Directory where logs and processed data are stored.