Customer Segmentation Project

Overview

This project performs customer segmentation using the Mall Customer dataset. The goal is to cluster customers based on their Annual Income and Spending Score to identify meaningful segments and gain business insights.

Clustering is performed using K-Means, with optional experiments using DBSCAN and Agglomerative Clustering. The project also includes data preprocessing, visualization, and cluster profiling.

Dataset

Source: Mall Customer Segmentation Dataset (Kaggle)
Features:
- CustomerID — Unique customer identifier
- Gender — Male or Female
- Age — Age of the customer
- Annual Income (k$) — Annual income in thousands
- Spending Score (1-100) — Customer spending behavior score

Project Steps

1. Data Collection

Loaded the dataset using pandas.
Checked for missing values, data types, and basic statistics.

2. Data Preprocessing

Handled missing values (none found in this dataset).
Selected relevant features: Annual Income (k$) and Spending Score.
Applied scaling using StandardScaler (or MinMaxScaler) to normalize feature values.

3. Exploratory Data Analysis (EDA)

Univariate Analysis: Histograms and boxplots for income and spending score.
Bivariate Analysis: Scatter plot of income vs. spending score.
Observed natural clusters and distribution patterns.

4. K-Means Clustering

Determined optimal number of clusters (k) using:
- Elbow Method
- Silhouette Score
Fitted K-Means model and labeled clusters.
Visualized clusters with scatter plots and highlighted centroids.

5. Cluster Profiling & Insights

Calculated average income, spending score, and cluster size.
Interpreted customer types:
- Budget-conscious, premium, average, impulsive/value seekers, careful/practical.
Suggested business strategies for each segment.

6. Bonus: Other Clustering Algorithms

DBSCAN: Identified density-based clusters and outliers.
Agglomerative Clustering: Hierarchical clustering for comparison.
Compared average spending per cluster across different methods.

7. Model Saving

Saved trained models using joblib:
- kmeans_model.pkl
- dbscan_model.pkl
- agglo_model.pkl
Saved scaler for preprocessing: scaler.pkl
Models can be reloaded to predict clusters for new data without retraining.

Tools & Libraries

Python 3.x
Libraries:
- pandas, numpy — data manipulation
- matplotlib, seaborn — visualization
- scikit-learn — clustering and preprocessing
- joblib — model saving/loading

How to Use

Clone or download the project folder.

Install required packages:

pip install pandas numpy matplotlib seaborn scikit-learn joblib

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.ipynb_checkpoints		.ipynb_checkpoints
Customer-Segmentation.ipynb		Customer-Segmentation.ipynb
Mall_Customers.csv		Mall_Customers.csv
README.md		README.md
agglo_model.pkl		agglo_model.pkl
dbscan_model.pkl		dbscan_model.pkl
kmeans_model.pkl		kmeans_model.pkl
requirements.txt		requirements.txt
scaler.pkl		scaler.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation Project

Overview

Dataset

Project Steps

1. Data Collection

2. Data Preprocessing

3. Exploratory Data Analysis (EDA)

4. K-Means Clustering

5. Cluster Profiling & Insights

6. Bonus: Other Clustering Algorithms

7. Model Saving

Tools & Libraries

How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation Project

Overview

Dataset

Project Steps

1. Data Collection

2. Data Preprocessing

3. Exploratory Data Analysis (EDA)

4. K-Means Clustering

5. Cluster Profiling & Insights

6. Bonus: Other Clustering Algorithms

7. Model Saving

Tools & Libraries

How to Use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages