An interactive web application built with Streamlit to predict student dropout, analyze contributing factors, and provide actionable insights using machine learning and model explainability techniques.
This dashboard provides a complete, end-to-end workflow for student dropout analysis:
Get a quick summary of the dataset, including data quality checks, student demographics, academic performance, and key risk factors.
Interactively explore feature distributions, correlations, and their relationship with student outcomes (Dropout, Graduate, Enrolled).
Train a Random Forest Classifier with a single click to predict student outcomes.
Assess model performance using accuracy, classification reports, and an interactive confusion matrix.
- Global Explanations: Understand the most important features driving predictions across the entire dataset using SHAP, Permutation Importance, and built-in feature importance.
- Local Explanations: Dive deep into why the model made a specific prediction for an individual student using SHAP Waterfall Plots and LIME.
- Input a student's data using interactive sliders and dropdowns
- Receive an instant prediction of the student's likely outcome (Dropout, Graduate, or Enrolled)
- Get a detailed explanation of the factors that influenced the prediction, along with actionable recommendations
- Explore how changing a single feature's value impacts the model's prediction probabilities
- Use the interactive feature explorer to view detailed statistics and dropout rates for any column
Upload your own student dataset in CSV format to use the dashboard's full capabilities.
| Data Overview & EDA | Model Explainability (SHAP) | Individual Prediction with Explanation |
|---|---|---|
![]() |
![]() |
![]() |
Replace the image links above with actual screenshots of your running application.
- Framework: Streamlit
- Data Manipulation: Pandas, NumPy
- Machine Learning: Scikit-learn
- Data Visualization: Matplotlib, Seaborn, Plotly
- Model Explainability: SHAP, LIME
Follow these instructions to set up and run the project locally.
- Python 3.9 or higher
- pip package manager
-
Clone the repository:
git clone https://github.com/your-username/student-dropout-prediction.git cd student-dropout-prediction -
Create and activate a virtual environment (recommended):
On Windows:
python -m venv .venv .\.venv\Scripts\activate
On macOS/Linux:
python3 -m venv .venv source .venv/bin/activate -
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
(Replace
app.pywith the actual name of your Python script if it's different.) -
Open your web browser: Navigate to
http://localhost:8501. The application should now be running.
The dashboard is organized into four main modules accessible from the sidebar navigation:
Start here to get a high-level understanding of your dataset.
Dive deeper into the data. Use the interactive charts to uncover trends and relationships between different student attributes and their final outcomes.
- Click the "Start Training" button to build the prediction model
- Once trained, view the model's performance metrics and feature importance charts
- Explore the "Model Explainability" tab to understand how the model works on a global and local level
- Navigate to this section to use the interactive prediction tool
- Adjust the sliders and inputs to match a student's profile
- Click "Predict with Explanation" to see the predicted outcome and the key factors that led to that decision
The application comes pre-loaded with a sample dataset (student_dropout_data.csv) from the UCI Machine Learning Repository. This dataset contains various demographic, socio-economic, and academic features for students.
You can also upload your own CSV file using the file uploader in the sidebar. Ensure your dataset has a Target column with values like 'Dropout', 'Graduate', and 'Enrolled' for full functionality.
Contributions are welcome! If you have suggestions for improvements or find any issues, please feel free to:
- Fork the repository
- Create a new feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Create a requirements.txt file in your repository with the following content:
streamlit
pandas
numpy
matplotlib
seaborn
scikit-learn
shap
lime
plotly- UCI Machine Learning Repository for the student dataset
- Streamlit community for the excellent framework
- SHAP and LIME libraries for model explainability


