EDA and Feature Engineering on Kaggle Datasets

This repository contains exploratory data analysis (EDA) and feature engineering work conducted on various datasets sourced from Kaggle. Each dataset represents a unique domain, and this project demonstrates practical data science workflows including data cleaning, transformation, visualisation, and feature engineering techniques.

Overview

Exploratory Data Analysis (EDA) is the first step in understanding the structure, patterns and anomalies in data, while feature engineering transforms raw data into suitable formats for model training. This repository showcases these essential steps applied to a variety of real-world datasets.

Repository Structure

BlackFridayAnalysis/ – Retail purchasing trends during Black Friday
DiwaliSalesAnalysis/ – Festive season consumer behaviour
FlightPriceAnalysis/ – Airline fare prediction data
GooglePlayStoreAnalysis/ – Mobile app insights from Google Play
HeartAnalysis/ – Heart disease prediction data
NetflixAnalysis/ – Content distribution and trends on Netflix
OlympicsAnalysis/ – Historical Olympics data exploration
OrdersAnalysis/ – E-commerce order history analysis
UberAnalysis/ – Ride-sharing patterns in urban areas
ZomatoAnalysis/ – Restaurant reviews and food delivery trends
BigO/ – Complexity trends and algorithm benchmarks
Pyspark/ – Feature engineering using PySpark

Purpose

To practice EDA and uncover hidden trends in diverse datasets
To apply domain-specific transformations for meaningful insights
To build a foundational understanding of data wrangling and feature creation
To serve as a learning resource for aspiring data scientists

Tools and Technologies

Python
Jupyter Notebooks
pandas, numpy
matplotlib, seaborn, plotly
scikit-learn (for preprocessing)
PySpark (for scalable data engineering)

Usage

Clone the repository:

git clone https://github.com/DebarjunChakraborty/EDA-and-Feature-Engineering.git
cd EDA-and-Feature-Engineering

Open any notebook of interest using Jupyter:

jupyter notebook

Explore the notebooks, each with self-contained EDA and preprocessing steps.

Contributing

Contributions are welcome! If you would like to add analysis on another dataset:

Fork this repository
Create a branch (feature-new-dataset)
Add your notebook in a new folder
Create a pull request

Author

Debarjun Chakraborty
Connect on GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDA and Feature Engineering on Kaggle Datasets

Table of Contents

Overview

Repository Structure

Purpose

Tools and Technologies

Usage

Contributing

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
BigO		BigO
BlackFridayAnalysis		BlackFridayAnalysis
DiwaliSalesAnalysis		DiwaliSalesAnalysis
FlightPriceAnalysis		FlightPriceAnalysis
GooglePlayStoreAnalysis		GooglePlayStoreAnalysis
HeartAnalysis		HeartAnalysis
NetflixAnalysis		NetflixAnalysis
OlympicsAnalysis		OlympicsAnalysis
OrdersAnalysis		OrdersAnalysis
Pyspark		Pyspark
UberAnalysis		UberAnalysis
ZomatoAnalysis		ZomatoAnalysis
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

EDA and Feature Engineering on Kaggle Datasets

Table of Contents

Overview

Repository Structure

Purpose

Tools and Technologies

Usage

Contributing

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages