This project is an end-to-end music machine learning system built on Spotify track features. It combines unsupervised learning, supervised classification, and similarity-based recommendation inside a custom Streamlit web app, creating an interactive Spotify-inspired music discovery experience.
The project has two main user flows:
Artist Tool: classify a track into a vibe category using audio featuresListener Tool: discover similar songs based on mood/preferences or a song search
The full workflow is documented in code_final.ipynb and deployed through web.py.
The pipeline works like this:
- Clean and preprocess a Spotify track dataset
- Use K-Means clustering to discover vibe groupings from audio features
- Name the resulting clusters as interpretable vibe labels
- Train a Random Forest classifier to predict those vibe labels
- Build a cosine-similarity recommendation engine for song matching
- Serve the system in a styled Streamlit interface
- Processed 114,000 Spotify tracks
- Built 4 interpretable vibe segments using K-Means
- Achieved 99.05% classification accuracy
- Achieved 0.988 macro F1-score
- Generated real-time music recommendations using cosine similarity
- Deployed an interactive Streamlit web application
Spotify Dataset
↓
Data Cleaning & Feature Engineering
↓
K-Means Clustering
↓
Vibe Labels
↓
Random Forest Classifier
↓
Artist Tool
↓
Vibe Prediction
Spotify Dataset
↓
Feature Scaling
↓
Cosine Similarity
↓
Recommendation Engine
↓
Listener Tool
↓
Music Recommendations
Both Pipelines
↓
Streamlit Web App
Spotify contains millions of tracks, making music discovery difficult for both listeners and independent artists.
This project addresses two challenges:
- Helping listeners discover songs that match their musical preferences.
- Helping artists understand how their songs are perceived based on audio characteristics.
The system combines clustering, classification, and recommendation techniques to create a personalized music discovery experience.
Dataset source: Spotify Tracks Dataset from Kaggle (https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset/data)
- Raw dataset size:
114,000tracks - Cleaned modeling dataset:
80,800tracks - Genres covered:
114 - Unique artists:
31,437 - Unique albums:
46,589 - Null values after cleaning:
0
Several transformations were applied before modeling:
- Removed missing and duplicate records
- Converted duration from milliseconds to minutes
- Standardized numerical audio features
- Selected 11 audio features for clustering and classification
- Exported recommendation-ready feature matrices for fast retrieval
The classification model predicts a song's vibe category using 11 track-level features:
- danceability
- energy
- valence
- tempo
- acousticness
- speechiness
- liveness
- instrumentalness
- loudness
- explicit
- duration_min
The system groups tracks into 4 vibe classes:
- Energetic & Danceable
- Acoustic & Mellow
- Instrumental
- Acoustic & Instrumental
The recommendation engine uses a scaled feature matrix and cosine similarity to retrieve similar songs.
It supports two flows:
- Preference-based: users move sliders for mood, energy, danceability, tempo, and related features
- Song-based: users search for a track name and get similar songs
Available filters in the app include:
- vibe filtering
- genre filtering
- result count selection
- same-vibe vs diverse-vibe retrieval
- K-Means with
k = 4 - Silhouette score:
0.3165 - Davies-Bouldin score:
1.1806
The vibe classifier is a RandomForestClassifier.
Evaluation results:
- Test accuracy:
0.9905 - Macro F1-score:
0.9883 - 5-fold CV macro F1 mean:
0.9871 - 5-fold CV macro F1 std:
0.0011
The recommendation engine was manually validated by testing multiple seed songs across genres and verifying that returned tracks exhibited similar audio characteristics and vibe profiles.
Similarity retrieval is based on cosine similarity over a scaled audio-feature space.
The Streamlit app in web.py provides two interfaces:
- manual audio-feature input with sliders
- vibe prediction with confidence score
- probability bar chart
- radar chart of vibe probabilities
- preference slider recommendations
- song search recommendations
- genre and vibe filters
- styled recommendation cards
.
├── code_final.ipynb # full analysis, preprocessing, clustering, classification, recommendation
├── web.py # Streamlit web application
├── style.css # custom app styling
├── dataset.csv # original dataset used in the notebook
├── rec_catalogue.csv # exported recommendation catalogue
├── vibe_classifier.pkl # trained Random Forest classifier
├── vibe_names.pkl # saved vibe label mapping
├── rec_scaler.pkl # fitted scaler for recommendation features
├── rec_matrix.npy # scaled matrix used for cosine similarity
├── spotify_logo.png # app branding asset
├── requirements.txt
└── README.md
- Python
- Pandas
- NumPy
- Scikit-learn
- SciPy
- Joblib
- Plotly
- Matplotlib
- Seaborn
- Streamlit
- HTML/CSS
- Integrate Spotify API for live track retrieval
- Support playlist-level vibe analysis
- Replace cosine similarity with neural embedding models
- Experiment with hybrid collaborative + content-based recommendations
- Deploy with user authentication and personalized recommendation history
git clone https://github.com/tracychanty/spotify-classification-recommendation-system.gitcd spotify-classification-recommendation-systempip install -r requirements.txtstreamlit run web.py





