End-to-end machine learning pipeline for predicting customer churn, estimating lifetime value, and generating actionable retention strategies using three classification models with SHAP-based interpretability.
This project builds a complete Customer Lifetime Value (CLV) prediction system that enables businesses to:
- Identify at-risk customers before they churn
- Segment customers by predicted lifetime value (Low β Premium tiers)
- Quantify churn drivers using SHAP feature importance
- Guide retention spend by targeting high-CLV, high-churn-risk customers
The analysis processes customer behavioural data including purchase history, engagement metrics, satisfaction scores, and support interactions to build predictive models that directly inform marketing and retention strategy.
| Feature Category | Variables | Purpose |
|---|---|---|
| Purchase Behaviour | total_purchases, avg_purchase_value, estimated_clv |
Revenue contribution profile |
| Engagement Metrics | website_visits, avg_time_per_visit, engagement_score |
Digital interaction intensity |
| Recency Signals | days_since_last_purchase |
Churn proximity indicator |
| Service Interactions | support_tickets_last_6_months, support_burden |
Friction measurement |
| Satisfaction | satisfaction_score |
Direct sentiment gauge |
| Membership | membership_status (Bronze β Platinum) |
Loyalty tier encoding |
| Referral Activity | referred_friends |
Advocacy measurement |
# Core derived metrics
estimated_clv = total_purchases Γ avg_purchase_value
revenue_per_visit = estimated_clv / website_visits
engagement_score = f(visits, time_per_visit, referred_friends)
support_burden = support_tickets / total_purchasesThree classification models trained with stratified cross-validation and class-weight balancing:
| Model | Configuration | Strengths |
|---|---|---|
| Logistic Regression | max_iter=1000, class_weight=balanced |
Interpretable coefficients, baseline |
| Random Forest | n_estimators=300, max_depth=10 |
Robust to outliers, feature importance |
| Gradient Boosting | n_estimators=300, max_depth=4, lr=0.05 |
Best predictive performance |
- AUC-ROC β Primary metric for ranking ability
- Cross-Validation AUC β Generalisation assessment (5-fold stratified)
- Classification Report β Precision, recall, F1 per class
- Confusion Matrix β Error pattern analysis
Comprehensive analysis of active vs. churned customer populations, examining demographic and behavioural differences across segments.
Customers segmented into four CLV tiers using quartile-based bucketing:
| Tier | Description | Strategic Priority |
|---|---|---|
| Premium | Top 25% by CLV | Retention-critical: loyalty programmes |
| High | 50thβ75th percentile | Growth opportunity: upsell candidates |
| Medium | 25thβ50th percentile | Engagement focus: activation campaigns |
| Low | Bottom 25% | Cost management: targeted win-backs |
SHAP-based feature attribution revealing the top 20 predictors of churn, ranked by absolute contribution to model output. Critical for translating ML output into actionable business decisions.
Interactive scatter visualisation plotting Estimated CLV vs. Churn Probability, colour-coded by segment and membership tier β the strategic decision matrix for retention spend allocation.
clv-analysis/
βββ README.md # This file
βββ LICENSE # MIT License
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore rules
β
βββ notebooks/
β βββ clv_analysis.ipynb # Complete analysis notebook
β
βββ docs/
βββ methodology.md # Detailed methodology notes
| Category | Tools |
|---|---|
| Language | Python 3.12 |
| Data Processing | Pandas, NumPy |
| Machine Learning | Scikit-learn (LogReg, RF, GBM) |
| Interpretability | SHAP |
| Visualisation | Plotly, Seaborn, Matplotlib |
| Environment | Jupyter Notebook |
# Clone the repository
git clone https://github.com/svanscreates/clv-analysis.git
cd clv-analysis
# Install dependencies
pip install -r requirements.txt
# Launch the analysis
jupyter notebook notebooks/clv_analysis.ipynbThis framework is directly applicable to:
- Subscription businesses β Predict churn timing, optimise retention offers
- E-commerce platforms β Segment customers for targeted marketing campaigns
- Financial services β Identify high-value client attrition risk
- Real estate β Client relationship management and lead scoring
- SaaS β Revenue forecasting and expansion revenue targeting
@misc{rastogi2025clv,
title={Customer Lifetime Value Prediction and Churn Analysis:
An ML-Powered Segmentation Framework},
author={Rastogi, Svanik},
year={2025},
howpublished={GitHub repository},
url={https://github.com/svanscreates/clv-analysis}
}This project is licensed under the MIT License β see the LICENSE file for details.