π Click here to view the Supply Chain Project PPT
Supply Chain Management (SCM) is essential for optimizing product flow from suppliers to customers. This project analyzes DataCo's supply chain dataset to identify inefficiencies, improve demand forecasting, and reduce late deliveries using Machine Learning (ML) models. The insights derived from this analysis help businesses make data-driven decisions and enhance operational efficiency.
DataCo faces significant supply chain challenges:
- π Inaccurate Sales Predictions & Order Quantities leading to inventory issues.
- β³ Late Deliveries causing customer dissatisfaction.
- π° Fraudulent Transactions reducing profit margins.
Goal: Optimize supply chain operations by mitigating these risks and improving efficiency.
β Improve demand forecasting to optimize inventory levels. β Detect fraudulent transactions and mitigate risks. β Identify sales trends to enhance product and regional performance insights. β Reduce delivery delays and improve customer satisfaction. β Support data-driven decision-making for supply chain management.
The dataset contains 2015-2018 supply chain records, including:
- π Market Share by Region β Sales distribution across different regions.
- π― Product Profitability β Identifying profitable and loss-making products.
- π Sales Trends β Weekly, monthly, and seasonal sales patterns.
- π³ Payment Modes β Understanding customer payment preferences.
- π΅ Fraudulent Transactions β Identifying fraud-prone customers and products.
- π Late Deliveries β Analyzing delays by product category and shipping method.
- π― Customer Segmentation β Using RFM analysis for customer behavior insights.
- π Europe & LATAM lead in sales per customer.
- π Africa & USCA have lower sales, indicating untapped growth potential.
- β Fitness & Sports gear (Nike, Under Armour) are most profitable.
- β Cleats & Footwear show high losses, indicating pricing or overstock issues.
- π Peak Sales: Fridays & November (Holiday promotions).
- π½ Lowest Sales: Tuesdays.
- β° High Activity: Early mornings & late afternoons.
- π© Most fraud occurs in Western Europe, Central & South America.
- π¨ Menβs Footwear & Cleats are the most targeted items.
- π€ Customer "Mary Smith" shows unusually high fraud cases.
- Cleats, Menβs Footwear, Womenβs Apparel face the most delays.
- Standard Class Shipping has the highest late deliveries, while Same-Day performs best.
- π Loyal Customers (10.5%) β Strengthen relationships with loyalty programs.
- π Champions (0.6%) β VIP customers; leverage referrals.
- π At-Risk Customers (11.4%) β Require retention strategies.
- β Lost Customers (4.4%) β Win-back offers needed.
- The cleaned supply chain dataset was stored in a MongoDB collection (cleaned_data) for processing.
- Organized the project into structured modules:
-
Extracted relevant fields from MongoDB, ignoring _id.
-
Extracted 180,519 rows from MongoDB.
-
Applied meaningful transformations:
-
Shipping Delay Calculation β (Days for shipping (real) - Days for shipment (scheduled))
-
Customer Order Frequency β Count of orders per customer.
-
Transformed 180,519 rows.
-
Stored transformed data in MongoDB under transformed_supply_chain collection.
-
Loaded 180,519 records into transformed collection.
-
Integrated Extract β Transform β Load into a single pipeline.
-
Ensured data isn't reprocessed multiple times.
-
Successfully executed full ETL pipeline!
- πΉ Models Trained : Linear Regression, Ridge, Lasso, Random Forest, XGBoost
- πΉ Linear Regression (Best for Sales, MAE: 0.0005, RMSE: 0.0014)
- πΉ Decision Tree (Best for Quantity, MAE: 0.0040, RMSE: 0.006)
- πΉ Models Trained : Logistic Regression, Decision Tree, Random Forest, XGBoost, KNN, SVM
- πΉ Random Forest β Best for Fraud Detection (Recall: 98.93%, Accuracy: 98.66%)
- πΉ Decision Tree β Best for Late Delivery Prediction (Accuracy: 99.37%, F1 Score: 99.42%)
β Cross-validation for better generalization. β Feature Importance analysis to refine prediction accuracy.
-
The project is deployed using Streamlit for interactive visualization.
-
Docker is used for containerization, ensuring a portable and consistent environment.
1οΈβ£ Clone the repository:
git clone https://github.com/SoundaryaBaskaran/Supply-Chain-Project.git
cd Supply-Chain-Project2οΈβ£ Install dependencies:
pip install -r requirements.txt3οΈβ£ Run the Jupyter Notebook:
jupyter notebookOpen the notebook file (.ipynb) and execute the scripts step by step to analyze the data and train models.
4οΈβ£ Run the Deployment (app.py):
streamlit run app.pyOnce the app is running, open the provided local URL in your browser to interact with the application.
β AI-driven demand forecasting to prevent stock issues.
β Fraud detection models to improve security & reduce financial risks.
β Optimized logistics strategies to minimize late deliveries.
β Regional marketing expansion in Europe & LATAM.
β Personalized retention offers for at-risk customers.
β Multi-supplier strategy for supply chain resilience.
π GitHub: https://github.com/SoundaryaBaskaran
π LinkedIn: SoundayaBaskaran
π Medium: SoundayaBaskaran
π If you find this project useful, donβt forget to β the repository! π