Skip to content

alekspiejka/physio-ema

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline for Matching Physiological and Ecological Momentary Assessment Data

Automated analysis pipeline for synchronizing Empatica E4 wearable device physiological data with Ecological Momentary Assessment (EMA) survey responses.

Overview

This project processes physiological data from the Empatica E4 wristband and synchronizes it with self-reported survey responses (ESM/EMA). The pipeline extracts physiological signals around survey completion moments and computes comprehensive health metrics including heart rate variability (HRV), activity levels, and electrodermal activity (EDA).

Key Features:

  • Automated matching of E4 recordings with ESM survey timestamps
  • 10-minute segment extraction before survey responses
  • Signal quality assessment and PPG reconstruction
  • HRV feature extraction from cleaned 30-second windows
  • Comprehensive physiological metric computation (ENMO, activity classification, EDA features)
  • Multi-stage data (MS1, MS2) integration and ESM data merging

Project Structure

code/
├── 1_ema_e4_matching.r           # Stage 1: Match ESM timestamps with E4 recordings
├── 2_segments_extraction.py      # Stage 2: Extract signal segments of a chosen length
├── 3_best_ppg.py                 # Stage 3: Signal processing and clean segment selection
├── 4_matching_physio.py          # Stage 4: Extract matching ACC/EDA segments
├── 5_physio_metrics.py           # Stage 5: Calculate comprehensive metrics
└── 6_merging.py                  # Stage 6: Merge data across stages and add ESM info

Pipeline Stages

1. EMA-E4 Matching (1_ema_e4_matching.r)

Purpose: Temporally align ESM survey responses with E4 device recordings.

Input:

  • ESM survey data with response timestamps (CET/CEST)
  • E4 participant IDs and recording file information

Output:

  • ids_unix_files_msX.csv: Mapping of survey responses to matching E4 files with Unix timestamps

Process:

  • Convert survey response times to Unix timestamps
  • Match ESM prompts to corresponding E4 recording sessions
  • Create linking file for downstream processing

2. Signal Segment Extraction (2_segments_extraction.py)

Purpose: Extract 10-minute physiological signal segments preceding each survey response.

Input:

  • ids_unix_files_msX.csv (EMA-E4 matching data)
  • E4 raw data files (BVP, ACC, EDA, HR, TEMP modalities)

Output:

  • Individual segment CSV files: {participant}_{timestamp}_{MODALITY}.csv
  • Organized by modality (BVP, ACC, EDA, etc.)

Details:

  • Extracts 600 seconds (10 minutes) of signal before ESM response
  • Handles all E4 modalities with appropriate sampling rates
  • Configurable to process different modalities independently

3. PPG Processing & 30-Second Window Selection (3_best_ppg_30s.py)

Purpose: Process BVP (blood volume pulse) signals and extract clean 30-second segments.

Input:

  • 10-minute BVP segments from Stage 2

Output:

  • BVP_clean_30s/: Cleaned 30-second segments
  • hrv_metrics_30s.csv: Complete HRV feature set

Processing Pipeline:

  1. Signal Quality Assessment: Uses e2epyppg SQA module
  2. PPG Reconstruction: Reconstructs clean PPG signal
  3. Clean Segment Extraction: Identifies best 30-second windows
  4. Peak Detection: Detects heartbeats (sampling rate: 64 Hz)
  5. HRV Extraction: Computes time-domain and frequency-domain HRV features

Metadata Captured:

  • Participant ID and response timestamp
  • Clean segment start index
  • Complete HRV metric set

4. Associated Modality Matching (4_matching_physio_30s.py)

Purpose: Extract ACC and EDA segments aligned with selected BVP 30-second windows.

Input:

  • hrv_metrics_30s.csv (clean segment indices from Stage 3)
  • 10-minute ACC and EDA segments

Output:

  • ACC_30s_clean/: 30-second ACC segments
  • EDA_30s_clean/: 30-second EDA segments

Technical Details:

  • Uses sample-based positioning (not time-based) for alignment
  • Sampling rates:
    • BVP: 64 Hz (1920 samples in 30s)
    • ACC: 32 Hz (960 samples in 30s)
    • EDA: 4 Hz (120 samples in 30s)

5. Physiological Metrics Computation (5_physio_metrics.py)

Purpose: Calculate comprehensive features from all modalities.

Input:

  • 10-minute ACC/EDA segments
  • 30-second ACC/EDA/BVP segments
  • HRV metrics from Stage 3

Output:

  • comprehensive_metrics_pyeda.csv: All computed features
  • final_dataset_30s_only_physio.csv: Consolidated physiological dataset

Computed Metrics:

Activity from ACC:

  • ENMO (Euclidean Norm Minus One)
  • Activity classification: sedentary, light, moderate, vigorous
  • Mean heart rate (derived from IBI)

EDA Features:

  • Statistical features via pyEDA library
  • Mean EDA level
  • Tonic and phasic components

6. Multi-Stage Integration & ESM Merging (6_merging.py)

Purpose: Consolidate physiological data across multiple measurement stages and integrate ESM context.

Input:

  • final_dataset_30s_only_physio.csv from MS1 and MS2
  • ESM survey data with timestamps and responses

Output:

  • daily_hrv_gan_clean_emo_band.csv: Merged physiological data
  • physio_esm_all_analysis.csv: Physiological data with ESM context

Integration Features:

  • Concatenates MS1 and MS2 physiological data
  • Converts ESM timestamps to Unix format (CET/CEST timezone handling)
  • Merge-joins on [participant, response_timestamp]
  • Enables correlational analysis between responses and physiology

Data Flow Diagram

ESM Survey Data + E4 Files
         │
         ↓
[Stage 1] Match timestamps
         │
         ↓
[Stage 2] Extract 10-min segments (all modalities)
         │
         ├─→ BVP Segments
         │        │
         │        ↓
         │   [Stage 3] Signal processing & HRV
         │        │
         │        ├─→ Clean 30-sec BVP
         │        └─→ HRV metrics + indices
         │                 │
         │                 ↓
         │   [Stage 4] Match ACC/EDA to clean indices
         │        │
         │        ├─→ 30-sec ACC
         │        └─→ 30-sec EDA
         │
         └─→ ACC/EDA 10-min Segments
                  │
                  ↓
              [Stage 5] Compute all metrics
                  │
                  ↓
          Comprehensive metrics
                  │
                  ↓
              [Stage 6] Merge MS1+MS2 + ESM
                  │
                  ↓
          Final dataset with ESM context

Requirements

  • Python: ≥3.10

  • Key Libraries:

    • pandas, numpy: Data processing
    • e2epyppg: PPG signal processing and HRV extraction (GitHub: Holoself/E2EPPG)
    • pyEDA: EDA feature extraction using PyEDA pipeline (GitHub: HealthSciTech/pyEDA)
    • torch: Deep learning models in e2epyppg
    • tqdm: Progress bars
  • R: For Stage 1 matching script

    • Base R functions for data manipulation

Installation of Key Packages

The two main signal processing packages should be installed from their GitHub repositories:

# Install e2epyppg for PPG signal processing and HRV
pip install git+https://github.com/Holoself/E2EPPG.git

# Install pyEDA for EDA feature extraction
pip install git+https://github.com/HealthSciTech/pyEDA.git

# Or install all dependencies including the ones above
pip install -r requirements.txt

Configuration

Each script includes configurable parameters at the top:

  • File paths: Adjust input/output directories for your data structure
  • Signal parameters: Sampling rates, segment durations (10-min, 30-sec)
  • Processing thresholds: Activity classification, signal quality thresholds
  • Unit conversions: ACC units (digital LSB, g, mg), timezone handling

See individual script headers for detailed configuration options.

Usage

Run the pipeline in sequence:

# Stage 1: Matching (R)
Rscript code/1_ema_e4_matching.r

# Stages 2-6: Processing (Python)
python code/2_segments_extraction.py
python code/3_best_ppg_30s.py
python code/4_matching_physio_30s.py
python code/5_physio_metrics.py
python code/6_merging.py

Important: Update file paths in each script to match your data directory structure before running.

Output Files

File Description
ids_unix_files_msX.csv EMA-E4 timestamp matching
BVP_clean_30s/ Cleaned PPG segments
hrv_metrics_30s.csv HRV features for all responses
ACC_30s_clean/ Accelerometer 30-sec segments
EDA_30s_clean/ Electrodermal activity 30-sec segments
comprehensive_metrics_pyeda.csv All physiological features
final_dataset_30s_only_physio.csv Consolidated per-response metrics
physio_esm_all_analysis.csv Final dataset with ESM responses

Author

Aleksandra Piejka

License

See LICENSE file for details.

About

Matching ecological momentary assessments with physiological recordings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages