Skip to content

AKSHEXXXX/Polari-Sense

Repository files navigation



██████╗  ██████╗ ██╗      █████╗ ██████╗ ██╗███████╗███████╗███╗   ██╗███████╗███████╗
██╔══██╗██╔═══██╗██║     ██╔══██╗██╔══██╗██║██╔════╝██╔════╝████╗  ██║██╔════╝██╔════╝
██████╔╝██║   ██║██║     ███████║██████╔╝██║███████╗█████╗  ██╔██╗ ██║███████╗█████╗  
██╔═══╝ ██║   ██║██║     ██╔══██║██╔══██╗██║╚════██║██╔══╝  ██║╚██╗██║╚════██║██╔══╝  
██║     ╚██████╔╝███████╗██║  ██║██║  ██║██║███████║███████╗██║ ╚████║███████║███████╗
╚═╝      ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝╚══════╝╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝

Sensing what the eye cannot see — before corrosion takes hold.

PolariSense is a multi-phase machine learning pipeline that reads electrochemical polarization curves and predicts pitting corrosion onset in metallic alloys — combining heuristic signal analysis, Random Forest classification, and LSTM sequence modeling into one unified corrosion intelligence system.


Quickstart · Architecture · Results · Roadmap



🔬 The Problem

Metal doesn't fail all at once. It fails at a point — an onset — where the passive film breaks down, current spikes, and localized pitting corrosion begins. In aerospace alloys, biomedical implants, and structural materials, missing that onset by even a few millivolts can mean catastrophic failure downstream.

Traditional methods for detecting pitting onset rely on expert interpretation of polarization curves — slow, subjective, and unscalable. PolariSense automates this entirely, extracting onset signals from raw electrochemical data using both physics-informed heuristics and data-driven models trained on real Mg alloy experimental datasets.


✨ What Makes PolariSense Different

Capability Detail
🔍 Heuristic Onset Detection dI/dV thresholding using robust median + MAD statistics, refined by d²I/dV² inflection points — no black box, full interpretability
🌲 Random Forest Baseline Hand-crafted electrochemical features (Ecorr, Icorr, anodic/cathodic slopes, breakdown potential) fed into a production-grade RF classifier
🧠 LSTM Sequence Classifier Bidirectional LSTM on sliding windows of polarization data — learns temporal patterns in the current response, not just static features
Attention Mechanism (Phase 4) Temporal attention layer pinpoints which part of the curve drives the onset prediction — interpretable by design
🔗 3-Phase Fusion Architecture Corrosion signals fused with nanoindentation serration features via dual-branch neural network for integrated fracture risk estimation
📊 Trustworthy Evaluation Grouped CV by alloy/solution/experiment, alloy holdout testing, weak-label review queues — not just accuracy theater

🏗 System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        PolariSense Pipeline                     │
├─────────────────┬───────────────────────┬───────────────────────┤
│   PHASE 1       │      PHASE 2           │      PHASE 3          │
│ Corrosion Core  │  Serration Analysis    │   Fracture Fusion     │
│                 │                        │                       │
│ Polarization    │  Nanoindentation       │  Feature Alignment    │
│ Curve Ingestion │  Burst Detection       │                       │
│       ↓         │  (dh/dP threshold)     │  ┌──────────────┐    │
│ Preprocessing   │       ↓                │  │ Corrosion    │    │
│ (SG smooth,     │  22 Scalar Features    │  │ Embeddings   │──┐ │
│  derivatives,   │  (burst count, amp,    │  └──────────────┘  │ │
│  windowing)     │   duration, energy)    │                     ↓ │
│       ↓         │       ↓                │  ┌──────────────┐    │
│  Heuristic +    │  RF Instability        │  │ Serration    │──→ │
│  RF + LSTM      │  Classifier            │  │ Embeddings   │  Joint│
│       ↓         │                        │  └──────────────┘  Head│
│ Onset Potential │                        │        ↓              │
│ Prediction (V)  │                        │  Fracture Risk Score  │
└─────────────────┴───────────────────────┴───────────────────────┘

📁 Project Structure

polarisense/
├── src/
│   ├── common/
│   │   ├── data_loader.py          # CSV / JSON / NPZ multi-format ingestion
│   │   ├── preprocessing.py        # Normalisation, smoothing, derivatives, windowing
│   │   ├── evaluation.py           # MAE, RMSE, R², F1, confusion matrix
│   │   └── plotting.py             # Curve overlays, scatter plots, training history
│   ├── phase_1_corrosion/
│   │   ├── dataset_template.py     # Canonical schema (V, I, labels, metadata)
│   │   ├── onset_detection.py      # Heuristic dI/dV threshold detector
│   │   ├── synthetic_data.py       # Configurable synthetic polarization generator
│   │   ├── baseline_model.py       # Random Forest regression / classification
│   │   └── lstm_model.py           # Bidirectional LSTM with optional attention
│   ├── phase_2_serration/
│   │   ├── burst_detector.py       # dh/dP burst-finding pipeline
│   │   └── serration_features.py   # 22 mechanical instability features
│   └── phase_3_fusion/
│       ├── feature_alignment.py    # Corrosion ↔ serration feature alignment
│       ├── gb_fusion.py            # Gradient Boosting fusion model
│       └── nn_fusion.py            # Dual-branch MLP with joint fracture risk head
│
├── data_raw/                        # Original experimental files (CSV / JSON / NPZ)
├── data_processed/                  # Cleaned features, windowed sequences
│   └── phase_1_corrosion/
│       └── weak_label_review_queue.csv   # Prioritised re-labelling queue
├── models/                          # Saved artefacts (.pkl, .pt, .json metrics)
├── figures/                         # Auto-generated plots
├── results/                         # Prediction outputs & pipeline summary
├── notebooks/
│   └── 01_experiment.ipynb          # Full interactive walkthrough
├── run_pipeline.py                  # Single CLI entry point (all 3 phases)
├── requirements.txt
└── README.md

⚡ Quick Start

1. Install

git clone https://github.com/AKSHEXXXX/pitting-onset-prediction.git
cd pitting-onset-prediction
pip install -r requirements.txt

2. Run on synthetic data (zero setup)

python run_pipeline.py

This will:

  1. Generate 100 synthetic Mg-alloy polarization curves with configurable cathodic / passive / pitting regions
  2. Preprocess — Savitzky-Golay smoothing, min-max normalisation, dI/dV and d²I/dV² derivatives
  3. Run heuristic onset detection
  4. Train Random Forest baseline (corrosion severity: Low / Medium / High)
  5. Train LSTM binary classifier on sliding windows
  6. Evaluate all models → save metrics to results/
  7. Generate plots → save to figures/

3. Run with your own data

python run_pipeline.py data_raw/your_experiment_folder/

Place CSV files with columns potential_V and current_A in a subfolder of data_raw/. The pipeline auto-detects format (CSV / JSON / NPZ).

4. Interactive notebook

jupyter notebook notebooks/01_experiment.ipynb

📊 Model Performance

All metrics from Phase 1 RF Classifier on real Mg alloy datasets (301 rows, 189 severity-labelled).

Random Forest — Corrosion Severity Classifier

Evaluation Protocol Accuracy Macro F1 Balanced Acc
Holdout test split (full features) 94.7% 0.95 0.95
5-fold stratified CV 94.2%
Grouped CV (alloy / solution / file) 94.2% 0.942 93.98%
Alloy holdout — MG60 90.2% 0.90
Alloy holdout — MG70 92.7% 0.93
Alloy holdout — MG80 95.1% 0.95
Reduced features (no Icorr / Imax) 78.9% 0.79
Experimental labels only (LOOCV, n=8) 50.0% 0.41 0.44

Probabilistic Diagnostics

Metric Value
Multiclass Brier Score ≈ 0.085
Mean max class probability ≈ 0.88
High-confidence coverage (prob ≥ 0.8) ≈ 79% of holdout
Accuracy in high-confidence region 100%
Prediction flip rate under 2% feature noise ≈ 4.3%

⚠️ Honest note: The ~95% accuracy is primarily against weak (Icorr-inferred) labels. With only 8 true experimental corrosion-rate labels, real-world validation accuracy sits at ~50%. More experimental labels are the primary next step — not more model complexity.


🔬 Dataset Schema

Each sample in the pipeline is a structured dictionary:

Field Type Description
sample_id str Unique identifier
potential_V np.ndarray Applied potential sweep (V vs. reference)
current_A np.ndarray Measured current response (A or A/cm²)
pitting_onset_potential_V float | None Ground-truth onset potential — None if no pitting
pitting_onset_index int | None Array index of onset point
material str Alloy identifier (e.g. MG60, MG70, SS304)
electrolyte str Electrolyte / concentration description
scan_rate_mV_s float Potentiodynamic scan rate in mV/s
metadata dict Experiment conditions, source file, lab notes

🧪 Feature Engineering

Every polarization curve is distilled into a single feature vector for the RF models:

Ecorr_calc          ← Corrosion potential (open-circuit crossover)
Icorr_calc          ← Corrosion current density (Tafel extrapolation)
max_current         ← Peak current in scan range
anodic_slope        ← Tafel slope — anodic branch
cathodic_slope      ← Tafel slope — cathodic branch
breakdown_potential ← Potential at rapid current rise (pitting trigger)
R1, R2              ← Impedance resistances (from EIS if available)
C1, Q1, n           ← CPE parameters from equivalent circuit fitting

For the LSTM, raw potential-current sequences are windowed with configurable overlap, normalized, and fed as time series.


🗺 Roadmap

✅ Phase 1 — Corrosion Core (Complete)

  • Multi-format data ingestion (CSV, JSON, NPZ)
  • Savitzky-Golay preprocessing + derivative features
  • Heuristic onset detection (dI/dV + d²I/dV² threshold)
  • Random Forest classifier with full evaluation suite
  • Bidirectional LSTM classifier + regression mode
  • Grouped / alloy-holdout CV for leakage-resistant evaluation
  • Weak-label review queue for expert re-labelling

✅ Phase 2 — Serration Analysis (Complete)

  • Nanoindentation burst detection via dh/dP thresholding
  • 22 mechanical instability features extracted
  • RF instability classifier + regressor validated on synthetic data

✅ Phase 3 — Fusion System (Complete)

  • Corrosion ↔ serration feature alignment
  • Gradient Boosting feature-level fusion
  • Dual-branch MLP with joint fracture risk head
  • Synthetic risk labeling (pitting proximity × serration activity)

🔲 Phase 4 — Advanced Extensions (Planned)

  • Attention-LSTM — temporal attention for interpretable onset localization
  • 1D-CNN / Transformer — benchmark against LSTM
  • Shear band microstructural features — grain size, crystallographic orientation as auxiliary inputs
  • Multi-material generalization — single model across alloy families
  • Hyperparameter search — Bayesian optimization (Optuna)
  • k-fold CV on real data — after label expansion

🔲 Phase 5 — Deployment (Planned)

  • ONNX model export
  • FastAPI inference endpoint
  • Batch prediction script for new experimental data
  • Auto-generated PDF reports (curves + predictions + confidence)

🧰 Dependencies

numpy, pandas          → Data handling
scipy                  → Savitzky-Golay smoothing, signal processing
scikit-learn           → Random Forest, metrics, cross-validation
torch                  → LSTM model (PyTorch)
matplotlib             → Visualization
pip install -r requirements.txt

🤝 Contributing

This project is in active development. If you work with electrochemical data, nanoindentation, or materials ML and want to contribute real experimental datasets, open an issue or reach out directly.

Areas where contributions would have the most impact:

  • Real polarization curve datasets (any alloy family)
  • Expert-annotated ground-truth pitting onset labels
  • EIS impedance data for additional feature engineering

📄 License

MIT License — see LICENSE for details.



PolariSense · Built with electrochemistry, PyTorch, and a deep respect for how metals fail.

Star the repo if you find it useful ⭐

About

ML pipeline for predicting pitting corrosion onset from electrochemical polarization curves

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors