Skip to content

SoundaryaBaskaran/Supply-Chain-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“¦ Supply Chain Management Analysis – DataCo Dataset

πŸ“‚ Project Presentation

πŸ“„ Click here to view the Supply Chain Project PPT

πŸš€ Introduction

Supply Chain Management (SCM) is essential for optimizing product flow from suppliers to customers. This project analyzes DataCo's supply chain dataset to identify inefficiencies, improve demand forecasting, and reduce late deliveries using Machine Learning (ML) models. The insights derived from this analysis help businesses make data-driven decisions and enhance operational efficiency.


🎯 Problem Statement

DataCo faces significant supply chain challenges:

  • πŸ“‰ Inaccurate Sales Predictions & Order Quantities leading to inventory issues.
  • ⏳ Late Deliveries causing customer dissatisfaction.
  • πŸ’° Fraudulent Transactions reducing profit margins.

Goal: Optimize supply chain operations by mitigating these risks and improving efficiency.


πŸ“Š Project Objectives

βœ” Improve demand forecasting to optimize inventory levels. βœ” Detect fraudulent transactions and mitigate risks. βœ” Identify sales trends to enhance product and regional performance insights. βœ” Reduce delivery delays and improve customer satisfaction. βœ” Support data-driven decision-making for supply chain management.


πŸ—‚ Dataset Overview

The dataset contains 2015-2018 supply chain records, including:

  • πŸ“ Market Share by Region – Sales distribution across different regions.
  • 🎯 Product Profitability – Identifying profitable and loss-making products.
  • πŸ“… Sales Trends – Weekly, monthly, and seasonal sales patterns.
  • πŸ’³ Payment Modes – Understanding customer payment preferences.
  • πŸ•΅ Fraudulent Transactions – Identifying fraud-prone customers and products.
  • 🚚 Late Deliveries – Analyzing delays by product category and shipping method.
  • 🎯 Customer Segmentation – Using RFM analysis for customer behavior insights.

πŸ” Key Insights & Findings

πŸ“ Market Share by Region

  • 🌍 Europe & LATAM lead in sales per customer.
  • 🌎 Africa & USCA have lower sales, indicating untapped growth potential.

πŸ’° Profitable & Loss-Making Products

  • βœ… Fitness & Sports gear (Nike, Under Armour) are most profitable.
  • ❌ Cleats & Footwear show high losses, indicating pricing or overstock issues.

πŸ“ˆ Sales Trends & Seasonal Patterns

  • πŸ“… Peak Sales: Fridays & November (Holiday promotions).
  • πŸ”½ Lowest Sales: Tuesdays.
  • ⏰ High Activity: Early mornings & late afternoons.

πŸ” Fraud Detection Insights

  • 🚩 Most fraud occurs in Western Europe, Central & South America.
  • 🚨 Men’s Footwear & Cleats are the most targeted items.
  • πŸ‘€ Customer "Mary Smith" shows unusually high fraud cases.

πŸš› Late Deliveries by Category & Shipping Method

  • Cleats, Men’s Footwear, Women’s Apparel face the most delays.
  • Standard Class Shipping has the highest late deliveries, while Same-Day performs best.

πŸ† Customer Segmentation (RFM Analysis)

  • πŸ‘‘ Loyal Customers (10.5%) – Strengthen relationships with loyalty programs.
  • πŸŽ– Champions (0.6%) – VIP customers; leverage referrals.
  • πŸ› At-Risk Customers (11.4%) – Require retention strategies.
  • ❌ Lost Customers (4.4%) – Win-back offers needed.

πŸ”„ ETL Pipeline Implementation

1️⃣ Stored Cleaned Data in MongoDB

  • The cleaned supply chain dataset was stored in a MongoDB collection (cleaned_data) for processing.

2️⃣ ETL Pipeline Setup

  • Organized the project into structured modules:

3️⃣ Extract Phase (src/extract.py)

  • Extracted relevant fields from MongoDB, ignoring _id.

  • Extracted 180,519 rows from MongoDB.

4️⃣ Transform Phase (src/transform.py)

  • Applied meaningful transformations:

  • Shipping Delay Calculation β†’ (Days for shipping (real) - Days for shipment (scheduled))

  • Customer Order Frequency β†’ Count of orders per customer.

  • Transformed 180,519 rows.

5️⃣ Load Phase (src/load.py)

  • Stored transformed data in MongoDB under transformed_supply_chain collection.

  • Loaded 180,519 records into transformed collection.

6️⃣ Final ETL Pipeline (src/etl_pipeline.py)

  • Integrated Extract β†’ Transform β†’ Load into a single pipeline.

  • Ensured data isn't reprocessed multiple times.

  • Successfully executed full ETL pipeline!


πŸ€– Machine Learning Models Used

πŸ“Œ Regression Models for Sales & Order Quantity Prediction

  • πŸ”Ή Models Trained : Linear Regression, Ridge, Lasso, Random Forest, XGBoost
  • πŸ”Ή Linear Regression (Best for Sales, MAE: 0.0005, RMSE: 0.0014)
  • πŸ”Ή Decision Tree (Best for Quantity, MAE: 0.0040, RMSE: 0.006)

πŸ“Œ Classification Models for Fraud & Late Deliveries

  • πŸ”Ή Models Trained : Logistic Regression, Decision Tree, Random Forest, XGBoost, KNN, SVM
  • πŸ”Ή Random Forest – Best for Fraud Detection (Recall: 98.93%, Accuracy: 98.66%)
  • πŸ”Ή Decision Tree – Best for Late Delivery Prediction (Accuracy: 99.37%, F1 Score: 99.42%)

Model Improvement Techniques

βœ” Cross-validation for better generalization. βœ” Feature Importance analysis to refine prediction accuracy.


πŸš€ Deployment

  • The project is deployed using Streamlit for interactive visualization.

  • Docker is used for containerization, ensuring a portable and consistent environment.


βš™ How to Run This Project

πŸ”§ Installation Steps

1️⃣ Clone the repository:

git clone https://github.com/SoundaryaBaskaran/Supply-Chain-Project.git
cd Supply-Chain-Project

2️⃣ Install dependencies:

pip install -r requirements.txt

πŸ“Š For Data Analysis & Model Training

3️⃣ Run the Jupyter Notebook:

  jupyter notebook

Open the notebook file (.ipynb) and execute the scripts step by step to analyze the data and train models.

πŸš€ For Running the Deployed Application

4️⃣ Run the Deployment (app.py):

  streamlit run app.py

Once the app is running, open the provided local URL in your browser to interact with the application.


πŸ“’ Business Recommendations

βœ” AI-driven demand forecasting to prevent stock issues.
βœ” Fraud detection models to improve security & reduce financial risks.
βœ” Optimized logistics strategies to minimize late deliveries.
βœ” Regional marketing expansion in Europe & LATAM.
βœ” Personalized retention offers for at-risk customers.
βœ” Multi-supplier strategy for supply chain resilience.


πŸ“¬ Contact

πŸ”— GitHub: https://github.com/SoundaryaBaskaran
πŸ”— LinkedIn: SoundayaBaskaran
πŸ”— Medium: SoundayaBaskaran

πŸ™Œ If you find this project useful, don’t forget to ⭐ the repository! πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages