An interactive data analytics and machine-learning application built with Streamlit for exploring a loan-default dataset and predicting borrower risk in real time.
- Overview
- Features
- Dataset
- Project Structure
- Installation
- Usage
- Screenshots
- Technologies Used
- Contributing
- License
This project provides an end-to-end workflow for Loan Default Prediction:
- Explore — Rich exploratory data analysis (EDA) with 10+ interactive chart types.
- Model — Train and compare three ML algorithms (Random Forest, Gradient Boosting, Logistic Regression).
- Predict — Enter applicant details and receive an instant default-risk prediction with a probability gauge.
The app is designed with a premium, modern UI featuring gradient themes, animated cards, and smooth interactions.
- Hero section with a gradient banner and animated feature cards.
- At-a-glance dataset snapshot (total records, features, default rate).
| Visualization | Description |
|---|---|
| Data Overview | Raw data preview, column types, missing-value audit, descriptive statistics |
| Histograms + Box Plots | Distribution of any numerical feature with marginals |
| Violin Plots | Density + box plot overlay for numerical features |
| Small Multiples | All numerical distributions side-by-side |
| Bar & Donut Charts | Counts and proportions for categorical features |
| Stacked Default Bars | Default rate breakdown per category |
| Correlation Heatmap | Pairwise Pearson correlation matrix |
| Scatter Plot Studio | Interactive X/Y/Color selectors |
| Box Plots vs Target | Feature distributions split by default status |
| Pair Scatter Matrix | Pairwise scatter plots for top features |
| Grouped Mean Bars | Average feature values by default status |
| Feature | Description |
|---|---|
| 3 Algorithms | Random Forest · Gradient Boosting · Logistic Regression |
| Configurable Split | Adjustable test-size slider (10 %–40 %) |
| Evaluation Metrics | Accuracy · Precision · Recall · F1 Score |
| Confusion Matrix | Annotated heatmap |
| ROC Curve | Area under the curve visualization |
| Classification Report | Full precision / recall / F1 per class |
| Feature Importance | Top-15 bar chart (tree-based models) |
- Dynamic input form with sliders and dropdowns.
- Real-time default probability with a visual gauge indicator.
- Clear risk verdict (✅ Low Risk /
⚠️ High Risk).
The dataset (Loan_default.csv) contains 255,348 records and 18 columns:
| Column | Type | Description |
|---|---|---|
LoanID |
String | Unique loan identifier (dropped during preprocessing) |
Age |
Integer | Borrower's age |
Income |
Integer | Annual income |
LoanAmount |
Integer | Requested loan amount |
CreditScore |
Integer | Credit score (300–850) |
MonthsEmployed |
Integer | Months at current job |
NumCreditLines |
Integer | Number of open credit lines |
InterestRate |
Float | Loan interest rate (%) |
LoanTerm |
Integer | Loan term in months (12, 24, 36, 48, 60) |
DTIRatio |
Float | Debt-to-income ratio |
Education |
Category | High School · Bachelor's · Master's · PhD |
EmploymentType |
Category | Full-time · Part-time · Self-employed · Unemployed |
MaritalStatus |
Category | Single · Married · Divorced |
HasMortgage |
Category | Yes / No |
HasDependents |
Category | Yes / No |
LoanPurpose |
Category | Home · Auto · Education · Business · Other |
HasCoSigner |
Category | Yes / No |
Default |
Binary | Target — 0 = No Default, 1 = Default |
Loan_default.csv/
├── app.py # Main Streamlit entry point & navigation
├── eda.py # EDA visualizations (10+ chart types)
├── model.py # ML training, evaluation & prediction
├── utils.py # Data loading & preprocessing utilities
├── Loan_default.csv # Source dataset (255k rows)
├── requirements.txt # Python dependencies
└── README.md # This file
- Python 3.10+ installed on your system.
- (Optional) A virtual environment tool (
venv,conda, etc.).
# 1. Clone or download the project
cd "path/to/Loan_default.csv"
# 2. (Recommended) Create a virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS / Linux
# 3. Install dependencies
pip install -r requirements.txt# Launch the Streamlit app
streamlit run app.pyThe app will open automatically at http://localhost:8501.
Use the sidebar to navigate between pages:
| Page | What to do |
|---|---|
| 🏠 Home | Read the overview and check the dataset snapshot |
| 📊 Exploratory Analysis | Select features, switch tabs, explore visualizations |
| 🤖 ML Modeling | Pick an algorithm → Train → Evaluate → Predict |
Launch the app and explore each page to see the premium UI in action!
- Home — Gradient hero banner, feature cards, KPI metrics
- EDA — Interactive Plotly charts with tabs for Overview / Univariate / Bivariate
- ML Modeling — Model training, confusion matrix, ROC curve, feature importance, prediction gauge
| Technology | Purpose |
|---|---|
| Streamlit | Web application framework |
| Pandas | Data manipulation and analysis |
| NumPy | Numerical computing |
| Plotly | Interactive visualizations |
| Matplotlib | Static plotting |
| Seaborn | Statistical visualizations |
| scikit-learn | Machine learning pipeline |
Contributions are welcome! Feel free to:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.
Built with ❤️ using Streamlit • 2026