██████╗ ██████╗ ██╗ █████╗ ██████╗ ██╗███████╗███████╗███╗ ██╗███████╗███████╗
██╔══██╗██╔═══██╗██║ ██╔══██╗██╔══██╗██║██╔════╝██╔════╝████╗ ██║██╔════╝██╔════╝
██████╔╝██║ ██║██║ ███████║██████╔╝██║███████╗█████╗ ██╔██╗ ██║███████╗█████╗
██╔═══╝ ██║ ██║██║ ██╔══██║██╔══██╗██║╚════██║██╔══╝ ██║╚██╗██║╚════██║██╔══╝
██║ ╚██████╔╝███████╗██║ ██║██║ ██║██║███████║███████╗██║ ╚████║███████║███████╗
╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝╚══════╝╚══════╝╚═╝ ╚═══╝╚══════╝╚══════╝
PolariSense is a multi-phase machine learning pipeline that reads electrochemical polarization curves and predicts pitting corrosion onset in metallic alloys — combining heuristic signal analysis, Random Forest classification, and LSTM sequence modeling into one unified corrosion intelligence system.
Quickstart · Architecture · Results · Roadmap
Metal doesn't fail all at once. It fails at a point — an onset — where the passive film breaks down, current spikes, and localized pitting corrosion begins. In aerospace alloys, biomedical implants, and structural materials, missing that onset by even a few millivolts can mean catastrophic failure downstream.
Traditional methods for detecting pitting onset rely on expert interpretation of polarization curves — slow, subjective, and unscalable. PolariSense automates this entirely, extracting onset signals from raw electrochemical data using both physics-informed heuristics and data-driven models trained on real Mg alloy experimental datasets.
| Capability | Detail |
|---|---|
| 🔍 Heuristic Onset Detection | dI/dV thresholding using robust median + MAD statistics, refined by d²I/dV² inflection points — no black box, full interpretability |
| 🌲 Random Forest Baseline | Hand-crafted electrochemical features (Ecorr, Icorr, anodic/cathodic slopes, breakdown potential) fed into a production-grade RF classifier |
| 🧠 LSTM Sequence Classifier | Bidirectional LSTM on sliding windows of polarization data — learns temporal patterns in the current response, not just static features |
| ⚡ Attention Mechanism (Phase 4) | Temporal attention layer pinpoints which part of the curve drives the onset prediction — interpretable by design |
| 🔗 3-Phase Fusion Architecture | Corrosion signals fused with nanoindentation serration features via dual-branch neural network for integrated fracture risk estimation |
| 📊 Trustworthy Evaluation | Grouped CV by alloy/solution/experiment, alloy holdout testing, weak-label review queues — not just accuracy theater |
┌─────────────────────────────────────────────────────────────────┐
│ PolariSense Pipeline │
├─────────────────┬───────────────────────┬───────────────────────┤
│ PHASE 1 │ PHASE 2 │ PHASE 3 │
│ Corrosion Core │ Serration Analysis │ Fracture Fusion │
│ │ │ │
│ Polarization │ Nanoindentation │ Feature Alignment │
│ Curve Ingestion │ Burst Detection │ │
│ ↓ │ (dh/dP threshold) │ ┌──────────────┐ │
│ Preprocessing │ ↓ │ │ Corrosion │ │
│ (SG smooth, │ 22 Scalar Features │ │ Embeddings │──┐ │
│ derivatives, │ (burst count, amp, │ └──────────────┘ │ │
│ windowing) │ duration, energy) │ ↓ │
│ ↓ │ ↓ │ ┌──────────────┐ │
│ Heuristic + │ RF Instability │ │ Serration │──→ │
│ RF + LSTM │ Classifier │ │ Embeddings │ Joint│
│ ↓ │ │ └──────────────┘ Head│
│ Onset Potential │ │ ↓ │
│ Prediction (V) │ │ Fracture Risk Score │
└─────────────────┴───────────────────────┴───────────────────────┘
polarisense/
├── src/
│ ├── common/
│ │ ├── data_loader.py # CSV / JSON / NPZ multi-format ingestion
│ │ ├── preprocessing.py # Normalisation, smoothing, derivatives, windowing
│ │ ├── evaluation.py # MAE, RMSE, R², F1, confusion matrix
│ │ └── plotting.py # Curve overlays, scatter plots, training history
│ ├── phase_1_corrosion/
│ │ ├── dataset_template.py # Canonical schema (V, I, labels, metadata)
│ │ ├── onset_detection.py # Heuristic dI/dV threshold detector
│ │ ├── synthetic_data.py # Configurable synthetic polarization generator
│ │ ├── baseline_model.py # Random Forest regression / classification
│ │ └── lstm_model.py # Bidirectional LSTM with optional attention
│ ├── phase_2_serration/
│ │ ├── burst_detector.py # dh/dP burst-finding pipeline
│ │ └── serration_features.py # 22 mechanical instability features
│ └── phase_3_fusion/
│ ├── feature_alignment.py # Corrosion ↔ serration feature alignment
│ ├── gb_fusion.py # Gradient Boosting fusion model
│ └── nn_fusion.py # Dual-branch MLP with joint fracture risk head
│
├── data_raw/ # Original experimental files (CSV / JSON / NPZ)
├── data_processed/ # Cleaned features, windowed sequences
│ └── phase_1_corrosion/
│ └── weak_label_review_queue.csv # Prioritised re-labelling queue
├── models/ # Saved artefacts (.pkl, .pt, .json metrics)
├── figures/ # Auto-generated plots
├── results/ # Prediction outputs & pipeline summary
├── notebooks/
│ └── 01_experiment.ipynb # Full interactive walkthrough
├── run_pipeline.py # Single CLI entry point (all 3 phases)
├── requirements.txt
└── README.md
git clone https://github.com/AKSHEXXXX/pitting-onset-prediction.git
cd pitting-onset-prediction
pip install -r requirements.txtpython run_pipeline.pyThis will:
- Generate 100 synthetic Mg-alloy polarization curves with configurable cathodic / passive / pitting regions
- Preprocess — Savitzky-Golay smoothing, min-max normalisation, dI/dV and d²I/dV² derivatives
- Run heuristic onset detection
- Train Random Forest baseline (corrosion severity: Low / Medium / High)
- Train LSTM binary classifier on sliding windows
- Evaluate all models → save metrics to
results/ - Generate plots → save to
figures/
python run_pipeline.py data_raw/your_experiment_folder/Place CSV files with columns potential_V and current_A in a subfolder of data_raw/. The pipeline auto-detects format (CSV / JSON / NPZ).
jupyter notebook notebooks/01_experiment.ipynbAll metrics from Phase 1 RF Classifier on real Mg alloy datasets (301 rows, 189 severity-labelled).
| Evaluation Protocol | Accuracy | Macro F1 | Balanced Acc |
|---|---|---|---|
| Holdout test split (full features) | 94.7% | 0.95 | 0.95 |
| 5-fold stratified CV | 94.2% | — | — |
| Grouped CV (alloy / solution / file) | 94.2% | 0.942 | 93.98% |
| Alloy holdout — MG60 | 90.2% | 0.90 | — |
| Alloy holdout — MG70 | 92.7% | 0.93 | — |
| Alloy holdout — MG80 | 95.1% | 0.95 | — |
| Reduced features (no Icorr / Imax) | 78.9% | 0.79 | — |
| Experimental labels only (LOOCV, n=8) | 50.0% | 0.41 | 0.44 |
| Metric | Value |
|---|---|
| Multiclass Brier Score | ≈ 0.085 |
| Mean max class probability | ≈ 0.88 |
| High-confidence coverage (prob ≥ 0.8) | ≈ 79% of holdout |
| Accuracy in high-confidence region | 100% |
| Prediction flip rate under 2% feature noise | ≈ 4.3% |
⚠️ Honest note: The ~95% accuracy is primarily against weak (Icorr-inferred) labels. With only 8 true experimental corrosion-rate labels, real-world validation accuracy sits at ~50%. More experimental labels are the primary next step — not more model complexity.
Each sample in the pipeline is a structured dictionary:
| Field | Type | Description |
|---|---|---|
sample_id |
str |
Unique identifier |
potential_V |
np.ndarray |
Applied potential sweep (V vs. reference) |
current_A |
np.ndarray |
Measured current response (A or A/cm²) |
pitting_onset_potential_V |
float | None |
Ground-truth onset potential — None if no pitting |
pitting_onset_index |
int | None |
Array index of onset point |
material |
str |
Alloy identifier (e.g. MG60, MG70, SS304) |
electrolyte |
str |
Electrolyte / concentration description |
scan_rate_mV_s |
float |
Potentiodynamic scan rate in mV/s |
metadata |
dict |
Experiment conditions, source file, lab notes |
Every polarization curve is distilled into a single feature vector for the RF models:
Ecorr_calc ← Corrosion potential (open-circuit crossover)
Icorr_calc ← Corrosion current density (Tafel extrapolation)
max_current ← Peak current in scan range
anodic_slope ← Tafel slope — anodic branch
cathodic_slope ← Tafel slope — cathodic branch
breakdown_potential ← Potential at rapid current rise (pitting trigger)
R1, R2 ← Impedance resistances (from EIS if available)
C1, Q1, n ← CPE parameters from equivalent circuit fitting
For the LSTM, raw potential-current sequences are windowed with configurable overlap, normalized, and fed as time series.
- Multi-format data ingestion (CSV, JSON, NPZ)
- Savitzky-Golay preprocessing + derivative features
- Heuristic onset detection (dI/dV + d²I/dV² threshold)
- Random Forest classifier with full evaluation suite
- Bidirectional LSTM classifier + regression mode
- Grouped / alloy-holdout CV for leakage-resistant evaluation
- Weak-label review queue for expert re-labelling
- Nanoindentation burst detection via dh/dP thresholding
- 22 mechanical instability features extracted
- RF instability classifier + regressor validated on synthetic data
- Corrosion ↔ serration feature alignment
- Gradient Boosting feature-level fusion
- Dual-branch MLP with joint fracture risk head
- Synthetic risk labeling (pitting proximity × serration activity)
- Attention-LSTM — temporal attention for interpretable onset localization
- 1D-CNN / Transformer — benchmark against LSTM
- Shear band microstructural features — grain size, crystallographic orientation as auxiliary inputs
- Multi-material generalization — single model across alloy families
- Hyperparameter search — Bayesian optimization (Optuna)
- k-fold CV on real data — after label expansion
- ONNX model export
- FastAPI inference endpoint
- Batch prediction script for new experimental data
- Auto-generated PDF reports (curves + predictions + confidence)
numpy, pandas → Data handling
scipy → Savitzky-Golay smoothing, signal processing
scikit-learn → Random Forest, metrics, cross-validation
torch → LSTM model (PyTorch)
matplotlib → Visualization
pip install -r requirements.txtThis project is in active development. If you work with electrochemical data, nanoindentation, or materials ML and want to contribute real experimental datasets, open an issue or reach out directly.
Areas where contributions would have the most impact:
- Real polarization curve datasets (any alloy family)
- Expert-annotated ground-truth pitting onset labels
- EIS impedance data for additional feature engineering
MIT License — see LICENSE for details.
PolariSense · Built with electrochemistry, PyTorch, and a deep respect for how metals fail.
Star the repo if you find it useful ⭐