This repository contains exploratory data analysis (EDA) and feature engineering work conducted on various datasets sourced from Kaggle. Each dataset represents a unique domain, and this project demonstrates practical data science workflows including data cleaning, transformation, visualisation, and feature engineering techniques.
Exploratory Data Analysis (EDA) is the first step in understanding the structure, patterns and anomalies in data, while feature engineering transforms raw data into suitable formats for model training. This repository showcases these essential steps applied to a variety of real-world datasets.
BlackFridayAnalysis/– Retail purchasing trends during Black FridayDiwaliSalesAnalysis/– Festive season consumer behaviourFlightPriceAnalysis/– Airline fare prediction dataGooglePlayStoreAnalysis/– Mobile app insights from Google PlayHeartAnalysis/– Heart disease prediction dataNetflixAnalysis/– Content distribution and trends on NetflixOlympicsAnalysis/– Historical Olympics data explorationOrdersAnalysis/– E-commerce order history analysisUberAnalysis/– Ride-sharing patterns in urban areasZomatoAnalysis/– Restaurant reviews and food delivery trendsBigO/– Complexity trends and algorithm benchmarksPyspark/– Feature engineering using PySpark
- To practice EDA and uncover hidden trends in diverse datasets
- To apply domain-specific transformations for meaningful insights
- To build a foundational understanding of data wrangling and feature creation
- To serve as a learning resource for aspiring data scientists
- Python
- Jupyter Notebooks
- pandas, numpy
- matplotlib, seaborn, plotly
- scikit-learn (for preprocessing)
- PySpark (for scalable data engineering)
- Clone the repository:
git clone https://github.com/DebarjunChakraborty/EDA-and-Feature-Engineering.git
cd EDA-and-Feature-Engineering- Open any notebook of interest using Jupyter:
jupyter notebook- Explore the notebooks, each with self-contained EDA and preprocessing steps.
Contributions are welcome! If you would like to add analysis on another dataset:
- Fork this repository
- Create a branch (
feature-new-dataset) - Add your notebook in a new folder
- Create a pull request
Debarjun Chakraborty
Connect on GitHub