Health Insurance Prediction – Machine Learning Project

Overview

This project focuses on predicting whether a customer has health insurance using demographic, financial, and lifestyle data.

The goal is to support data-driven marketing strategies, allowing companies to target high-potential customers, reduce costs, and improve customer acquisition.

Business Problem

Marketing campaigns for health insurance are often broad and inefficient.

This project answers the key question:

How can we predict which customers are most likely to subscribe to health insurance in order to optimize marketing efforts and improve inclusion?

Dataset

Observations: 72,458 customers
Features: 15 variables
Target: health_ins (1 = Has insurance, 0 = No insurance)

Feature Types:

Demographic → age, sex, marital status
Financial → income
Lifestyle → housing, vehicles, gas usage
Geographic → state of residence

Key Insights from Analysis

Income is the strongest predictor of insurance ownership
Older individuals are more likely to have insurance
More vehicles and better housing → higher probability of insurance
Gas usage showed low predictive power
Strong class imbalance (majority already insured)

Data Preparation

Missing values handled (median / mode imputation)
Outliers treated (capping + log transformation)
Feature scaling (Standardization)
Categorical encoding (One-Hot Encoding)
Feature engineering:
- income_per_vehicle
- income_age_interaction
- financial_status
- mobility_indicator

Models Implemented

Logistic Regression
Random Forest
XGBoost
Gradient Boosting
Neural Network (MLP)

Model Performance

Model	Accuracy	F1-Score	AUC
Logistic Regression	0.68	0.79	0.76
Random Forest	0.84	0.91	0.73
XGBoost	0.69	0.80	0.79
Gradient Boosting	0.90	0.95	0.80
Neural Network	0.23	0.27	0.47

Best Model

Gradient Boosting achieved the best overall performance:

High accuracy (0.90)
Strong balance between precision and recall
Robust handling of class imbalance

Business Impact

This model enables:

Targeted marketing campaigns
Reduction in marketing costs
Higher conversion rates
Better understanding of customer behavior
Improved inclusion of underserved groups

Model Deployment

Final model saved using joblib
Predictions generated on unseen test data
Output file: final_predictions.csv

Key Learnings

Feature engineering has a major impact on performance
Handling class imbalance is critical
Simpler models can be strong baselines
Ensemble methods (Boosting) perform best on structured data

Tech Stack

Python
Pandas, NumPy
Scikit-learn
XGBoost
Matplotlib, Seaborn

Conclusion

This project demonstrates how machine learning can be applied to solve a real-world business problem by combining data analysis, feature engineering, and predictive modeling.

The final model provides both accurate predictions and actionable insights, making it valuable for strategic decision-making.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
customer.csv		customer.csv
customer_datadictionary.txt		customer_datadictionary.txt
customer_test_masked.csv		customer_test_masked.csv
health_ins.ipynb		health_ins.ipynb
sample_submission.csv		sample_submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Insurance Prediction – Machine Learning Project

Overview

Business Problem

Dataset

Feature Types:

Key Insights from Analysis

Data Preparation

Models Implemented

Model Performance

Best Model

Business Impact

Model Deployment

Key Learnings

Tech Stack

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Health Insurance Prediction – Machine Learning Project

Overview

Business Problem

Dataset

Feature Types:

Key Insights from Analysis

Data Preparation

Models Implemented

Model Performance

Best Model

Business Impact

Model Deployment

Key Learnings

Tech Stack

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages