🚀 Live Dashboard | 📂 API Documentation
Welcome to the ReturnX Intelligent Returns Classifier – a production-grade machine learning system designed to automate the triaging of e-commerce product returns. This project demonstrates the end-to-end lifecycle of an NLP application, from raw text processing to financial impact analysis, solving a critical operational bottleneck.
- ReturnX Intelligent Returns Classifier 📦
The goal of this project is to automate the classification of customer return reasons (Defect, Sizing, Style, Other) to reduce manual processing costs and improve routing efficiency. It moves beyond simple keyword matching by using advanced Natural Language Processing (NLP) to understand customer intent from unstructured text and metadata.
This project was built to demonstrate robust programming skills and Data Science expertise, focusing on reproducibility, statistical rigor, and business ROI.
- Business-Aware NLP: Custom text cleaning pipeline that handles "poison phrases" (e.g., "true to size" in a sizing complaint) and preserves critical negations.
- Modular Architecture: Clean Python package structure (
src/) separating ETL, Modeling, and Inference logic. - Mixed-Data Modeling: Combines high-dimensional TF-IDF text features with normalized numeric metadata (Age, Rating, Word Count).
- Financial Impact Analysis: Evaluates the model based on Net Savings ($) and ROI (%), not just F1-score.
- Integrated Testing: Includes unit tests (
pytest) to verify text cleaning logic and prediction integrity. - Production Dashboard: Interactive Streamlit interface providing real-time predictions and confidence scores.
This project follows a strict MLOps workflow to ensure data quality and scalability.
- Return Data: Aggregated customer reviews and return labels from the retail database (
retail_returns.labeled_reviews_v). - Metadata: Ingested customer validation metrics including Age and Product Ratings to provide context to the unstructured text.
Instead of relying solely on raw dumps, I utilized SQL and Python for robust data lifting.
- Ingestion: Automated extraction mechanism with CSV fallback for offline development.
- Cleaning: Implemented a rigorous text cleaning function
clean_review_textto remove noise while keeping sentiment-bearing words and negations ("not", "never").
Exploratory Data Analysis (EDA) revealed that text alone wasn't enough; context matters.
- The Intent: A 3-star rating with the word "small" implies something different than a 5-star rating with the word "small".
- The Engineering: I implemented a
ColumnTransformerto normalize numeric data (Age, Rating) viaMaxAbsScalerand combined it with a 5,000-feature TF-IDF matrix (unigrams/bigrams) to capture complex customer intent.
A machine learning model is only as good as the value it creates.
- Scenario: A production batch of 4,529 returns.
- Strategy: The system compares "Business As Usual" (Manual Review at ~$2.00/item) against Feature-Automated costs ($0.10/item), factoring in a $5.00 penalty for every misclassification to calculate true ROI.
This project leverages a modern Python Data Science stack.
- 🐍 Python 3.10+: The core language.
- 🐼 Pandas & NumPy: For high-performance data manipulation and vectorization.
- 🤖 Scikit-Learn & XGBoost: For machine learning pipelines, gradient boosting, and cross-validation.
- ⚖️ Imbalanced-Learn: For SMOTE synthetic over-sampling to handle rare defect classes.
- 📊 Streamlit: For the production dashboard interface.
- ⚡ FastAPI: For building high-performance, production-ready APIs.
- 🖤 Black: For automated and consistent code formatting.
- 🔍 Pylint: For static code analysis and ensuring code quality.
- 🗄️ SQLAlchemy: For robust database interaction.
- 🧪 Pytest: For unit testing and verifying pipeline integrity.
Follow these steps to get a local copy up and running.
- Python 3.10+
- Pip
-
Clone the repository:
git clone https://github.com/GFFB0314/ReturnX-Intelligent-Returns-Classifier.git
-
Install Dependencies: It is recommended to use a virtual environment.
pip install -r requirements.txt
-
Run Tests: Verify the logic by running the test suite.
pytest tests/
You can interact with the project via Jupyter Notebooks for exploration or CLI for execution.
-
Notebooks (
notebooks/):01_extraction.ipynb: SQL Data Extraction.02_eda.ipynb: Exploratory Data Analysis.03_nlp_feature_engineering.ipynb: Text Cleaning, TF-IDF Vectorization & Modeling.04_modelling.ipynb: Executive Summary.
-
Live Deployment (Render):
- Dashboard: https://returnx-dashboard.onrender.com
- API (Swagger UI): https://returnx-api.onrender.com/docs
-
Command Line Interface (
main.py):To run the full training pipeline (ETL -> Train -> Save Artifacts):
python main.py train
To launch the interactive dashboard locally:
python main.py dashboard
-
Dashboard Features:
- Real-time classification of return comments.
- Confidence score visualization.
- Pre-loaded example scenarios for testing.
Simulated cost impact on a batch of 4,529 returns (modeled on historical data, not live deployment):
- Capture Efficiency: 94.1% 📈
- Operational Cost Reduction: 80% (From $9,058 to $1,807) 📉
- Net Savings: $7,250.10 💰
- Return on Investment (ROI): 401% 🚀
Real metric validated on actual data:
- Macro F1-Score: 0.93 (XGBoost Champion Model)
The system successfully automates the vast majority of returns while acting as a "Safety Valve" by routing ambiguous cases to the "Other" category for human review, minimizing expensive errors.
Note: These savings are modeled based on stated cost assumptions (manual review: $2/item, automated: $0.10/item, misclassification penalty: $5/error). Real impact requires live warehouse integration and cost validation.
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/NewFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/NewFeature) - Open a Pull Request
For any questions, issues, or suggestions, please feel free to contact:
- Email: gbetnkom.bechir@gmail.com
- LinkedIn: Fares Fahim Bechir Gbetnkom
- GitHub Issues: Project Issues
MIT License 📝
© 2026 Fares Gbetnkom. This project is licensed under the MIT License — feel free to use, modify, and distribute it. See the full license text here.
Happy Classifying! 🎯