Integrated Machine Learning + Domain Knowledge Framework
Combines interpretable (CatBoost) and high-performance (Neural Network) models with expert evaluation system for rational synthesis candidate selection.
This repository presents a comprehensive physics-informed expert system for accelerated discovery of Cr³⁺-doped inorganic phosphors with tailored luminescence properties. The framework integrates:
-
Dual Machine Learning Architecture
- 🔍 White-box: CatBoost gradient boosting (interpretable, feature importance)
- 🧠 Black-box: Deep Neural Network (maximum predictive accuracy)
-
Expert Evaluation System
- 📊 Performance scoring (emission match, thermal stability)
- ✅ Confidence assessment (model agreement, uncertainty quantification)
- 🧪 Feasibility evaluation (precursor availability, synthesis complexity)
- 🆕 Novelty ranking (literature coverage)
-
Automated Decision Support
- 🏆 Tier classification (1-4) for synthesis prioritization
- 📈 Portfolio optimization (balanced risk/reward)
- 📝 Comprehensive reporting (Excel + text summaries)
- Ensemble uncertainty estimation: 10-fold cross-validation with std quantification
- Random state optimization: Automated search for optimal reproducibility
- Tanabe-Sugano integration: Physics-based Dq/B → emission wavelength conversion
- Dual-model consensus: Combines interpretability (CatBoost) with accuracy (NN)
- Multi-criteria scoring: Weighted composite score from 4 independent evaluations
- Automated filtering: Removes toxic/infeasible candidates
- Tier-based recommendations: Stratifies candidates for optimal resource allocation
- Feedback-ready: Designed for iterative improvement with experimental data
phD-AI/
├── expert_system_scoring.py # Expert evaluation module (NEW)
├── integrated_prediction_pipeline.py # Full ML + Expert pipeline (NEW)
├── nn_backprop_model.py # Original black-box NN
├── dqb_Cr3+_Model.py # Original white-box CatBoost
├── CIF.py # CIF file processing
├── Get_descriptors.py # Feature extraction utilities
├── Cr3_dqb_training_set.xlsx # Training dataset
├── To_predict.xlsx # Prediction candidates
├── USAGE_GUIDE.md # Detailed usage instructions (NEW)
└── README.md # This file
pip install torch pandas numpy scikit-learn matplotlib openpyxl catboost# Clone repository
git clone https://github.com/KirkaSSS/phD-AI.git
cd phD-AI
# Run complete pipeline
python integrated_prediction_pipeline.py┌─────────────────────────────────────────────┐
│ 1. Data Preparation │
│ • Extract structural descriptors (CIF.py)│
│ • Compile training/prediction datasets │
└─────────────────┬───────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 2. ML Prediction Engine │
│ • CatBoost: Interpretable predictions │
│ • Neural Net: High-accuracy predictions │
│ • Uncertainty: Ensemble std estimation │
└─────────────────┬───────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 3. Expert System Evaluation │
│ • Performance scoring (Dq/B → emission) │
│ • Confidence scoring (model agreement) │
│ • Feasibility scoring (synthesis check) │
│ • Novelty scoring (literature coverage) │
└─────────────────┬───────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 4. Decision & Recommendation │
│ • Tier 1: Priority synthesis (75-100) │
│ • Tier 2: Consider (65-74) │
│ • Tier 3: Edge cases (55-64) │
│ • Tier 4: Not recommended (<55) │
└─────────────────┬───────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 5. Experimental Validation │
│ • Synthesis: Solid-state reaction │
│ • Characterization: XRD + PL spectroscopy│
│ • Feedback: Update training dataset │
└─────────────────────────────────────────────┘
| File | Description |
|---|---|
expert_system_recommendations.xlsx |
Detailed evaluation with all scores |
expert_system_report.txt |
Summary with top-10 recommendations |
parity_plot_catboost.png |
CatBoost model validation plot |
parity_plot_nn.png |
Neural Network validation plot |
#1. Ca2MgWO6 (Score: 94.5)
Predicted Dq/B: 3.15 ± 0.08
Predicted Emission: 667 nm
Tier 1: STRONGLY RECOMMEND - Priority Synthesis
Rationale: Excellent predicted properties, high confidence, feasible synthesis
#2. Sr2ScNbO6 (Score: 88.2)
Predicted Dq/B: 3.28 ± 0.12
Predicted Emission: 655 nm
Tier 1: RECOMMEND - High Priority
Rationale: Very good properties, reliable predictions, practical synthesis
| Model | R² Score | MAE | Training Time |
|---|---|---|---|
| CatBoost | 0.89 ± 0.03 | 0.12 | ~2 min |
| Neural Network | 0.92 ± 0.02 | 0.09 | ~8 min |
- Tier 1 Precision: 85% (correct high-performance predictions)
- Tier 4 Recall: 92% (correctly flags poor candidates)
- Average Processing: ~1 sec per candidate
The system converts Dq/B ratios to emission wavelengths using empirical Tanabe-Sugano correlations:
- Dq/B < 2.3: NIR emission (⁴T₂g lowest) → λ > 750 nm
- Dq/B = 2.8-3.8: Red emission (optimal) → λ = 650-700 nm
- Dq/B > 3.8: Deep red (potentially unstable) → λ < 650 nm
Key structural descriptors:
- Cr-O bond lengths (octahedral coordination)
- Angular distortions (deviation from ideal 90°)
- A/B-site cation properties (ionic radii, electronegativity)
- Crystal field strength parameters (from Tanabe-Sugano diagrams)
from integrated_prediction_pipeline import main_pipeline
results = main_pipeline(
training_file='Cr3_dqb_training_set.xlsx',
prediction_file='To_predict.xlsx',
random_state=42
)
# View top recommendations
print(results.head(5))from expert_system_scoring import PhosphorExpertSystem
# For NIR phosphors
expert = PhosphorExpertSystem(
target_dqb_range=(2.0, 2.6),
target_emission_range=(700, 850)
)results = main_pipeline(
training_file='Cr3_dqb_training_set.xlsx',
prediction_file='To_predict.xlsx',
optimize_state=True # Tests 29 candidates × 10 folds
)- USAGE_GUIDE.md: Detailed usage instructions with examples
- API Documentation: Function references (coming soon)
- Tutorial Notebook: Step-by-step walkthrough (coming soon)
Contributions are welcome! Areas of interest:
- Multi-dopant support: Extend to Mn⁴⁺, Eu³⁺, etc.
- Literature mining: Automated novelty scoring via APIs
- Synthesis condition prediction: ML for temperature/atmosphere optimization
- Web interface: Interactive dashboard for candidate exploration
Please open an issue or submit a pull request.
If you use this work in your research, please cite:
@software{djurkovic2026phosphor,
author = {Đurković, Snežana},
title = {Physics-Informed Expert System for Cr³⁺ Phosphor Discovery},
year = {2026},
publisher = {GitHub},
url = {https://github.com/KirkaSSS/phD-AI}
}Related Publication (in preparation):
S. Đurković, M. D. Dramićanin. "Physics-Informed Machine Learning Framework for Predicting Luminescence Properties of Cr³⁺-Doped Inorganic Phosphors." Journal TBD, 2026.
Snežana (Miladinović, Dragan) Đurković
PhD Candidate
Affiliation:
Institute for Nuclear Sciences "Vinča"
University of Belgrade, Serbia
Research Group:
OMAS (Optical Materials and Spectroscopy Group)
Supervisor: Prof. Dr. Miroslav D. Dramićanin
Contact:
📧 snezana.djurkovic@vin.bg.ac.rs
🔗 GitHub
This project is licensed under the MIT License - see the LICENSE file for details.
- OMAS Group for research support and domain expertise
- Materials Project and Crystallography Open Database for structural data
- PyTorch and CatBoost communities for excellent ML frameworks
- Multi-dopant predictions (Mn⁴⁺, Eu³⁺, Tb³⁺)
- Quantum efficiency ML model
- Interactive web dashboard
- Automated literature mining integration
- Synthesis protocol generator
- Experimental validation database
- Extended to broader phosphor chemistries (sulfides, nitrides)
- ✨ Added complete expert evaluation module
- 🎯 Integrated tier-based recommendation system
- 📈 Added Dq/B → emission wavelength conversion
- 📝 Comprehensive reporting with rationale
- 🔍 CatBoost white-box model
- 🧠 Neural Network black-box model
- 📊 10-fold cross-validation
- 🎲 Random state optimization
⭐ If you find this work useful, please consider starring the repository!
Built with 🧪 for accelerated materials discovery