Skip to content

svanscreates/clv-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Customer Lifetime Value Prediction & Churn Analysis

ML-Powered Customer Segmentation and Retention Strategy

Python Jupyter Scikit-learn License

End-to-end machine learning pipeline for predicting customer churn, estimating lifetime value, and generating actionable retention strategies using three classification models with SHAP-based interpretability.


🎯 Project Overview

This project builds a complete Customer Lifetime Value (CLV) prediction system that enables businesses to:

  • Identify at-risk customers before they churn
  • Segment customers by predicted lifetime value (Low β†’ Premium tiers)
  • Quantify churn drivers using SHAP feature importance
  • Guide retention spend by targeting high-CLV, high-churn-risk customers

The analysis processes customer behavioural data including purchase history, engagement metrics, satisfaction scores, and support interactions to build predictive models that directly inform marketing and retention strategy.


πŸ“‹ Methodology

Feature Engineering

Feature Category Variables Purpose
Purchase Behaviour total_purchases, avg_purchase_value, estimated_clv Revenue contribution profile
Engagement Metrics website_visits, avg_time_per_visit, engagement_score Digital interaction intensity
Recency Signals days_since_last_purchase Churn proximity indicator
Service Interactions support_tickets_last_6_months, support_burden Friction measurement
Satisfaction satisfaction_score Direct sentiment gauge
Membership membership_status (Bronze β†’ Platinum) Loyalty tier encoding
Referral Activity referred_friends Advocacy measurement

Engineered Features

# Core derived metrics
estimated_clv = total_purchases Γ— avg_purchase_value
revenue_per_visit = estimated_clv / website_visits
engagement_score = f(visits, time_per_visit, referred_friends)
support_burden = support_tickets / total_purchases

πŸ€– Models

Three classification models trained with stratified cross-validation and class-weight balancing:

Model Configuration Strengths
Logistic Regression max_iter=1000, class_weight=balanced Interpretable coefficients, baseline
Random Forest n_estimators=300, max_depth=10 Robust to outliers, feature importance
Gradient Boosting n_estimators=300, max_depth=4, lr=0.05 Best predictive performance

Evaluation Metrics

  • AUC-ROC β€” Primary metric for ranking ability
  • Cross-Validation AUC β€” Generalisation assessment (5-fold stratified)
  • Classification Report β€” Precision, recall, F1 per class
  • Confusion Matrix β€” Error pattern analysis

πŸ“ˆ Key Outputs

1. Churn Distribution Analysis

Comprehensive analysis of active vs. churned customer populations, examining demographic and behavioural differences across segments.

2. CLV Segmentation

Customers segmented into four CLV tiers using quartile-based bucketing:

Tier Description Strategic Priority
Premium Top 25% by CLV Retention-critical: loyalty programmes
High 50th–75th percentile Growth opportunity: upsell candidates
Medium 25th–50th percentile Engagement focus: activation campaigns
Low Bottom 25% Cost management: targeted win-backs

3. Feature Importance (SHAP)

SHAP-based feature attribution revealing the top 20 predictors of churn, ranked by absolute contribution to model output. Critical for translating ML output into actionable business decisions.

4. Customer Risk Map

Interactive scatter visualisation plotting Estimated CLV vs. Churn Probability, colour-coded by segment and membership tier β€” the strategic decision matrix for retention spend allocation.


πŸ“ Repository Structure

clv-analysis/
β”œβ”€β”€ README.md                       # This file
β”œβ”€β”€ LICENSE                         # MIT License
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ .gitignore                      # Git ignore rules
β”‚
β”œβ”€β”€ notebooks/
β”‚   └── clv_analysis.ipynb          # Complete analysis notebook
β”‚
└── docs/
    └── methodology.md              # Detailed methodology notes

πŸ› οΈ Tech Stack

Category Tools
Language Python 3.12
Data Processing Pandas, NumPy
Machine Learning Scikit-learn (LogReg, RF, GBM)
Interpretability SHAP
Visualisation Plotly, Seaborn, Matplotlib
Environment Jupyter Notebook

πŸš€ Getting Started

# Clone the repository
git clone https://github.com/svanscreates/clv-analysis.git
cd clv-analysis

# Install dependencies
pip install -r requirements.txt

# Launch the analysis
jupyter notebook notebooks/clv_analysis.ipynb

πŸ’‘ Business Applications

This framework is directly applicable to:

  • Subscription businesses β€” Predict churn timing, optimise retention offers
  • E-commerce platforms β€” Segment customers for targeted marketing campaigns
  • Financial services β€” Identify high-value client attrition risk
  • Real estate β€” Client relationship management and lead scoring
  • SaaS β€” Revenue forecasting and expansion revenue targeting

πŸ“ Citation

@misc{rastogi2025clv,
  title={Customer Lifetime Value Prediction and Churn Analysis: 
         An ML-Powered Segmentation Framework},
  author={Rastogi, Svanik},
  year={2025},
  howpublished={GitHub repository},
  url={https://github.com/svanscreates/clv-analysis}
}

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


Svanik Rastogi Β· Christ University, Bangalore Β· BBA Strategy & Business Analytics

LinkedIn Email

About

ML-powered Customer Lifetime Value prediction and churn analysis with SHAP interpretability

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors