This project is a sports analytics and machine learning system designed to simulate athlete training data, engineer sports science metrics, and provide actionable injury risk insights through a coach-facing dashboard.
This system bridges the gap between raw training data and clinical decision support. It mimics the workflows used by professional sports organizations to monitor athlete health and optimize performance.
- Synthetic Data Generation: Simulates stochastic athlete training workloads and latent injury patterns using semi-randomized intensity variables.
- Feature Engineering: Calculates longitudinal sports science metrics including Acute:Chronic Workload Ratio (ACWR), fatigue accumulation, and high-intensity streaks.
- Algorithmic Benchmarking: Implements and compares a linear baseline (Logistic Regression) against a non-linear ensemble method (Random Forest).
- Decision Engine: Deploys a dashboard that maps model probabilities and workload ratios to specific clinical recommendations.
The WorkloadGenerator creates a synthetic dataset by simulating:
- Session Types: Practice, gym, and match sessions.
- Metrics: Intensity, fatigue accumulation, and chronic/acute workloads.
- ACWR: The Acute:Chronic Workload Ratio, a gold-standard metric in sports science.
- Injury Events: Realistic injury triggers based on overtraining thresholds.
The system evaluates two distinct approaches:
- Logistic Regression: A baseline medical-style risk model.
- Random Forest: A nonlinear ensemble model for complex pattern recognition.
Performance is measured via: ROC-AUC scores, and Classification Reports.
The final output is a decision-support tool that translates data into coaching actions:
- Load Status: Categorizes athletes (Undertrained / Optimal / Overloaded).
- Risk Score: Probability percentage of injury.
- Recommendations: Clear instructions (e.g., "Recommend light recovery").
- Visuals: ACWR timelines with "Safe" and "Danger" zones.
pip install -r requirements.txtpython3 main_ml_models.pypython3 main_coach_planner.py- Visualizations: The system generates workload trend graphs for coaches.
- Actionable Data: Instead of just "High Risk," the system provides specific intervention advice based on the athlete's fatigue and workload history.
Dev Agrawal
Pre-Engineering
Earlham College
- Integration of real-world athlete datasets (GPS/Wearable data).
- Implementation of Gradient Boosting (XGBoost).
- Seasonal planning optimizer to peak for specific competition dates.