A full-stack machine learning system that predicts silica concentrate quality failures in an iron ore flotation plant — hours before they happen.
👉 View Live Streamlit Dashboard
In iron ore flotation processing, silica is an unwanted impurity. When silica concentration exceeds 4% in the final concentrate, product quality fails industry standards, customers impose penalty fees or reject shipments, and revenue is lost while reprocessing costs increase.
By the time quality degradation is detected through lab analysis, it is already too late to intervene. This project solves that problem.
A binary classification model trained on 737,453 hourly sensor readings predicts quality failures before they occur — giving operators time to adjust process parameters and prevent off-spec production.
Operators receive a three-tier alert system:
| Alert | Meaning |
|---|---|
| 🟢 GREEN | Normal operation — no action needed |
| 🟡 AMBER | Early warning — monitor closely |
| 🔴 RED | Intervention required + specific recommended actions |
Languages & Environment
Data & EDA
Machine Learning
Deployment
| Category | Tools |
|---|---|
| Data & EDA | Python, Pandas, NumPy, Matplotlib, Seaborn |
| Machine Learning | XGBoost, TensorFlow/Keras, Scikit-learn |
| Class Imbalance | Imbalanced-learn (SMOTE) |
| Deployment | Streamlit Cloud, Plotly, Joblib |
# Clone the repository
git clone https://github.com/your-username/mining-quality-intelligence.git
cd mining-quality-intelligence
# Install dependencies
pip install -r requirements.txt
# Run the Streamlit dashboard locally
streamlit run app.pyDataset: Quality Prediction in a Mining Process — available on Kaggle.
The dataset comprises 737,453 rows × 24 sensor columns sourced from a real iron ore flotation plant, with zero missing values confirmed across all features.
Twelve new derived features were engineered, followed by univariate, correlation, and time series analysis. Quality thresholds were defined as: Premium < 2%, Good < 3%, Acceptable < 4%, Poor ≥ 4%. Analysis revealed weekly operational cycles and shift-change patterns in the data.
Data was preprocessed using StandardScaler with SMOTE to address class imbalance, on an 80/20 stratified split. Two models were trained and compared:
- Model 1 — XGBoost: 300 estimators, max_depth=6, learning rate=0.05
- Model 2 — Neural Network: Architecture 128→64→32→1 with BatchNorm and Dropout layers
The primary evaluation metric was F1-Score, with threshold tuning applied to maximise net financial benefit for the business.
A real-time MiningQualityScorer pipeline class was built and deployed via a Streamlit dashboard with four interactive pages:
| Page | Description |
|---|---|
| 🏠 Live Scoring | Enter sensor readings → instant alert + recommended actions |
| 📈 Historical Trends | 168-hour probability trend + alert distribution |
| 🔍 Feature Inspector | Feature vs failure probability analysis |
| 📊 Drift Monitor | PSI heatmap — flags when model needs retraining |
PSI drift monitoring with automated retraining triggers ensures the model remains accurate as plant conditions evolve over time.
The full pipeline is complete and deployed. Future iterations may include:
- Multi-plant generalisation across different flotation configurations
- Integration with SCADA/DCS systems for direct operator alerts
- Expanded drift monitoring with automated model versioning
Developed by Lindiwe Songelwa — Data Scientist | Developer | Insight Creator
| Platform | Link |
|---|---|
| Lindiwe S. | |
| 🌐 Portfolio | Creative Portfolio |
| 🏅 Credly | Lindiwe Songelwa – Badges |
| 🚀 Live App | Streamlit Dashboard |
| sl.songelwa@hotmail.co.za |
© 2026 Lindiwe Songelwa. All rights reserved.