Skip to content

DebarjunChakraborty/EDA-and-Feature-Engineering

Repository files navigation

EDA and Feature Engineering on Kaggle Datasets

This repository contains exploratory data analysis (EDA) and feature engineering work conducted on various datasets sourced from Kaggle. Each dataset represents a unique domain, and this project demonstrates practical data science workflows including data cleaning, transformation, visualisation, and feature engineering techniques.


Table of Contents

  1. Overview
  2. Repository Structure
  3. Purpose
  4. Tools and Technologies
  5. Usage
  6. Contributing
  7. Author

Overview

Exploratory Data Analysis (EDA) is the first step in understanding the structure, patterns and anomalies in data, while feature engineering transforms raw data into suitable formats for model training. This repository showcases these essential steps applied to a variety of real-world datasets.

Repository Structure

  • BlackFridayAnalysis/ – Retail purchasing trends during Black Friday
  • DiwaliSalesAnalysis/ – Festive season consumer behaviour
  • FlightPriceAnalysis/ – Airline fare prediction data
  • GooglePlayStoreAnalysis/ – Mobile app insights from Google Play
  • HeartAnalysis/ – Heart disease prediction data
  • NetflixAnalysis/ – Content distribution and trends on Netflix
  • OlympicsAnalysis/ – Historical Olympics data exploration
  • OrdersAnalysis/ – E-commerce order history analysis
  • UberAnalysis/ – Ride-sharing patterns in urban areas
  • ZomatoAnalysis/ – Restaurant reviews and food delivery trends
  • BigO/ – Complexity trends and algorithm benchmarks
  • Pyspark/ – Feature engineering using PySpark

Purpose

  • To practice EDA and uncover hidden trends in diverse datasets
  • To apply domain-specific transformations for meaningful insights
  • To build a foundational understanding of data wrangling and feature creation
  • To serve as a learning resource for aspiring data scientists

Tools and Technologies

  • Python
  • Jupyter Notebooks
  • pandas, numpy
  • matplotlib, seaborn, plotly
  • scikit-learn (for preprocessing)
  • PySpark (for scalable data engineering)

Usage

  1. Clone the repository:
git clone https://github.com/DebarjunChakraborty/EDA-and-Feature-Engineering.git
cd EDA-and-Feature-Engineering
  1. Open any notebook of interest using Jupyter:
jupyter notebook
  1. Explore the notebooks, each with self-contained EDA and preprocessing steps.

Contributing

Contributions are welcome! If you would like to add analysis on another dataset:

  • Fork this repository
  • Create a branch (feature-new-dataset)
  • Add your notebook in a new folder
  • Create a pull request

Author

Debarjun Chakraborty
Connect on GitHub

About

Exploratory Data Analysis and Feature Engineering on diverse Kaggle datasets using Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors