This repository contains machine learning experiments and case studies developed as part of the project. It explores both classical ML approaches and modern deep learning techniques, with a focus on reproducibility, evaluation, and comparison.
-
Sentiment Analysis (IMDB dataset)
- Traditional ML baselines: Bag-of-Words + Logistic Regression, Random Forest
- Transformer-based deep learning: DistilBERT
- Performance benchmarking (Accuracy, F1-score, training efficiency)
-
Product Classification (FashionX dataset)
- End-to-end pipeline for image/text-based product categorization
- Feature engineering and optimization
- Model evaluation and improvement strategies
- Compare classical ML vs modern transformer models on real-world tasks
- Provide reproducible workflows in Jupyter Notebook (
.ipynb) format - Serve as a reference for applying ML to NLP and classification problems
Make sure you have the following installed:
-
Python 3.8+
-
Jupyter Notebook / JupyterLab
-
Required libraries:
pip install -r requirements.txt
-
Clone this repository:
git clone https://github.com/yourusername/machine-learning-sga-project.git cd machine-learning-sga-project -
Open Jupyter Notebook:
jupyter notebook
-
Navigate to the
.ipynbfiles and run the cells step by step.
- DistilBERT outperforms traditional ML models on sentiment analysis in terms of accuracy and generalization.
- Classical ML approaches still offer competitive results with lower computational cost.
- Product classification experiments demonstrate the trade-off between feature engineering and deep learning models.
machine-learning deep-learning nlp sentiment-analysis transformers classification distilbert sklearn jupyter-notebook data-science
This project is licensed under the MIT License - see the LICENSE file for details.
Developed as part of the SGA Machine Learning Project. Contributions and feedback are welcome!