Skip to content

sabin74/Mall-Customer-Segmentation

Repository files navigation

Customer Segmentation Project

Overview

This project performs customer segmentation using the Mall Customer dataset. The goal is to cluster customers based on their Annual Income and Spending Score to identify meaningful segments and gain business insights.

Clustering is performed using K-Means, with optional experiments using DBSCAN and Agglomerative Clustering. The project also includes data preprocessing, visualization, and cluster profiling.


Dataset

  • Source: Mall Customer Segmentation Dataset (Kaggle)
  • Features:
    • CustomerID — Unique customer identifier
    • Gender — Male or Female
    • Age — Age of the customer
    • Annual Income (k$) — Annual income in thousands
    • Spending Score (1-100) — Customer spending behavior score

Project Steps

1. Data Collection

  • Loaded the dataset using pandas.
  • Checked for missing values, data types, and basic statistics.

2. Data Preprocessing

  • Handled missing values (none found in this dataset).
  • Selected relevant features: Annual Income (k$) and Spending Score.
  • Applied scaling using StandardScaler (or MinMaxScaler) to normalize feature values.

3. Exploratory Data Analysis (EDA)

  • Univariate Analysis: Histograms and boxplots for income and spending score.
  • Bivariate Analysis: Scatter plot of income vs. spending score.
  • Observed natural clusters and distribution patterns.

4. K-Means Clustering

  • Determined optimal number of clusters (k) using:
    • Elbow Method
    • Silhouette Score
  • Fitted K-Means model and labeled clusters.
  • Visualized clusters with scatter plots and highlighted centroids.

5. Cluster Profiling & Insights

  • Calculated average income, spending score, and cluster size.
  • Interpreted customer types:
    • Budget-conscious, premium, average, impulsive/value seekers, careful/practical.
  • Suggested business strategies for each segment.

6. Bonus: Other Clustering Algorithms

  • DBSCAN: Identified density-based clusters and outliers.
  • Agglomerative Clustering: Hierarchical clustering for comparison.
  • Compared average spending per cluster across different methods.

7. Model Saving

  • Saved trained models using joblib:
    • kmeans_model.pkl
    • dbscan_model.pkl
    • agglo_model.pkl
  • Saved scaler for preprocessing: scaler.pkl
  • Models can be reloaded to predict clusters for new data without retraining.

Tools & Libraries

  • Python 3.x
  • Libraries:
    • pandas, numpy — data manipulation
    • matplotlib, seaborn — visualization
    • scikit-learn — clustering and preprocessing
    • joblib — model saving/loading

How to Use

  1. Clone or download the project folder.
  2. Install required packages:
    pip install pandas numpy matplotlib seaborn scikit-learn joblib
    

About

This project performs customer segmentation using the Mall Customer dataset. The goal is to cluster customers based on their Annual Income and Spending Score to identify meaningful segments and gain business insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors