Skip to content

indu-explores-data/Rossmann-Sales-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🛒 Rossmann Sales Prediction — Forecasting Daily Store Revenue

This project focuses on predicting sales for Rossmann, one of the largest drugstore chains in Europe. With over 4,000 stores across multiple countries, accurate sales forecasting is crucial for inventory planning, promotions, and resource allocation.
Through data exploration, feature engineering, and machine learning modeling, this project aims to build a robust sales prediction model using historical store data.


🧪 Objectives

  • Analyze historical sales and customer trends across stores.
  • Engineer meaningful time-based and promotional features.
  • Handle outliers to improve model performance.
  • Build and compare machine learning models for accurate forecasting.
  • Identify key drivers of sales using feature importance.

📌 Key Methods & Approach

  • Data Cleaning & Preprocessing

    • Treated missing values and corrected data types.
    • Removed extreme outliers using the IQR method.
  • Feature Engineering

    • Created Recency feature (days since last record).
    • Extracted Day of Week, Promo, and Holiday indicators.
    • Encoded categorical features.
  • Exploratory Data Analysis (EDA)

    • Sales & customer distribution analysis.
    • Promotion and holiday impact assessments.
    • Correlation and feature significance study.
  • Machine Learning Models

    • Decision Tree Regressor
    • Random Forest Regressor
    • AdaBoost Regressor
    • Stacking Regressor

📷 Visualizations

📊 Distribution & Outliers

Distribution of Sales, Customers and Recency
Box plot for Sales and Customers (Outliers) and Promo effect on Sales

⏳ Time-based Patterns

Sales by Day of the Week
Effect of State Holidays on Sales

🔍 Feature Relationships

Feature Correlation Heatmap
Feature Importance

🤖 Model Performance

Decision Tree Actual vs Predicted Sales
Random Forest Actual vs Predicted Sales
AdaBoost Actual vs Predicted Sales
Stacking Regressor Actual vs Predicted Sales


🔍 Key Insights & Outcomes

  • Sales Distribution

    • Daily sales mostly range from ₹2,000 to ₹10,000.
    • High-value outliers (>₹15,000) can skew model predictions.
  • Customer Behavior

    • Most stores serve <1,000 customers per day.
    • Sudden spikes (>1,500) act as noise and were treated as outliers.
  • Promotion Impact

    • Promo = 1 days show a clear increase in median sales.
    • Promotional campaigns are strong revenue boosters.
  • Day of the Week Trends

    • Sales drop significantly on Sundays due to partial/complete store closures.
    • Weekdays show more stable and higher sales.
  • Holiday Effects

    • State Holidays result in zero or near-zero sales, indicating closed stores.
    • These dates are essential for forecasting accuracy.
  • Feature Importance

    • Customers is the top predictor of sales.
    • Promo and DayOfWeek also strongly influence revenue.
    • Recency has a minor negative correlation.
  • Model Performance

    • Random Forest and Stacking Regressor performed best.
    • Achieved ~85%–86% R² accuracy on test data.

💻 Technologies Used

  • Python
  • pandas, numpy
  • matplotlib, seaborn
  • scikit-learn
  • Jupyter Notebook

🛠 Setup & Installation

1. Clone the Repository:

git clone https://github.com/indu-explores-data/Rossmann-Sales-Prediction.git

2. Navigate to the Project Directory:

cd Rossmann-Sales-Prediction

3. Create and Activate a Virtual Environment (Recommended):

python -m venv venv

Windows:

venv\Scripts\activate

Mac/Linux:

source venv/bin/activate

4. Install Required Libraries:

pip install -r requirements.txt

5. Launch Jupyter Notebook:

jupyter notebook

6. Open Rossmann Sales Prediction.ipynb and run all cells to reproduce the analysis.


▶️ Usage / How to Run

  • Open Rossmann_Sales_Prediction.ipynb in Jupyter Notebook
  • Run all cells sequentially
  • Explore visualizations and model comparisons
  • Final forecasts available in model output cells

🔗 Connect with Me

Let’s connect on LinkedIn for project discussions or data-driven collaborations:

LinkedIn


🙌 Feedback & Support

If you found this project helpful, please ⭐ star the repository and share your thoughts. Suggestions and contributions are always welcome!

About

This project focuses on predicting daily sales for Rossmann stores using historical data. The workflow covers end-to-end data analysis, feature engineering, and model building to deliver accurate forecasts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors