This project performs customer segmentation using the Mall Customer dataset. The goal is to cluster customers based on their Annual Income and Spending Score to identify meaningful segments and gain business insights.
Clustering is performed using K-Means, with optional experiments using DBSCAN and Agglomerative Clustering. The project also includes data preprocessing, visualization, and cluster profiling.
- Source: Mall Customer Segmentation Dataset (Kaggle)
- Features:
CustomerID— Unique customer identifierGender— Male or FemaleAge— Age of the customerAnnual Income (k$)— Annual income in thousandsSpending Score (1-100)— Customer spending behavior score
- Loaded the dataset using
pandas. - Checked for missing values, data types, and basic statistics.
- Handled missing values (none found in this dataset).
- Selected relevant features:
Annual Income (k$)andSpending Score. - Applied scaling using StandardScaler (or MinMaxScaler) to normalize feature values.
- Univariate Analysis: Histograms and boxplots for income and spending score.
- Bivariate Analysis: Scatter plot of income vs. spending score.
- Observed natural clusters and distribution patterns.
- Determined optimal number of clusters (
k) using:- Elbow Method
- Silhouette Score
- Fitted K-Means model and labeled clusters.
- Visualized clusters with scatter plots and highlighted centroids.
- Calculated average income, spending score, and cluster size.
- Interpreted customer types:
- Budget-conscious, premium, average, impulsive/value seekers, careful/practical.
- Suggested business strategies for each segment.
- DBSCAN: Identified density-based clusters and outliers.
- Agglomerative Clustering: Hierarchical clustering for comparison.
- Compared average spending per cluster across different methods.
- Saved trained models using
joblib:kmeans_model.pkldbscan_model.pklagglo_model.pkl
- Saved scaler for preprocessing:
scaler.pkl - Models can be reloaded to predict clusters for new data without retraining.
- Python 3.x
- Libraries:
pandas,numpy— data manipulationmatplotlib,seaborn— visualizationscikit-learn— clustering and preprocessingjoblib— model saving/loading
- Clone or download the project folder.
- Install required packages:
pip install pandas numpy matplotlib seaborn scikit-learn joblib