📊 Students Performance — EDA & Data Preprocessing

An Exploratory Data Analysis (EDA) and preprocessing project on the Students Performance dataset. The project covers feature selection, missing value handling, and outlier removal to prepare clean data for machine learning.

📋 Table of Contents

Project Overview
Dataset
Preprocessing Steps
Libraries Used
How to Run
Project Structure

📌 Project Overview

This project performs end-to-end data cleaning and preprocessing on a Students Performance dataset. The goal is to:

Remove low-variance (non-informative) features
Handle missing values in both numeric and categorical columns
Remove outliers using the IQR method
Prepare the dataset for downstream ML models

📂 Dataset

File: StudentsPerformance.csv
Rows: 1000 students
Columns: 8 features

Column	Type	Description
`gender`	Categorical	Male / Female
`race/ethnicity`	Categorical	Group A to E
`parental level of education`	Categorical	Education level of parents
`lunch`	Categorical	Standard / Free-Reduced
`test preparation course`	Categorical	Completed / None
`math score`	Numeric	Score in Math (0–100)
`reading score`	Numeric	Score in Reading (0–100)
`writing score`	Numeric	Score in Writing (0–100)

Sample Data:

gender	race	parental edu	lunch	test prep	math	reading	writing
female	group B	bachelor's	standard	none	72	72	74
female	group C	some college	standard	completed	69	90	88
male	group A	associate's	free/reduced	none	47	57	44

⚙️ Preprocessing Steps

Step 1 — Variance Threshold Feature Selection

VarianceThreshold(threshold=0.5)

Separated numeric columns from categorical
Removed any numeric feature with variance below 0.5
Kept all categorical columns as-is
Only informative numeric features retained

Step 2 — Handle Missing Values

Numeric columns:

Filled missing values with column mean

Categorical columns:

Filled missing values with column mode (most frequent value)

df[numeric_cols] = df[numeric_cols].fillna(df.mean(numeric_only=True))
df[cat_col] = df[cat_col].fillna(df[cat_col].mode()[0])

✅ After this step: df.isnull().sum() = 0 for all columns

Step 3 — Outlier Removal (IQR Method)

Applied IQR-based outlier removal on:

math score
reading score

Formula:

Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q1
Lower Bound = Q1 - 1.5 × IQR
Upper Bound = Q3 + 1.5 × IQR

Rows outside the bounds are dropped — keeping only clean, valid data.

🛠️ Libraries Used

pandas
numpy
seaborn
matplotlib
scikit-learn (VarianceThreshold)

🚀 How to Run

Clone the repository:

git clone https://github.com/hamza93-ai/Students-Performance-EDA.git

Open the notebook:

jupyter notebook students_performance_eda_preprocessing.ipynb

Place StudentsPerformance.csv in the same directory and run all cells.

📁 Project Structure

Students-Performance-EDA/
│
├── students_performance_eda_preprocessing.ipynb   # Main notebook
├── StudentsPerformance.csv                        # Dataset
└── README.md                                      # Project documentation

👤 Author

Hamza Asif

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
students_performance_eda_preprocessing.ipynb		students_performance_eda_preprocessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Students Performance — EDA & Data Preprocessing

📋 Table of Contents

📌 Project Overview

📂 Dataset

Sample Data:

⚙️ Preprocessing Steps

Step 1 — Variance Threshold Feature Selection

Step 2 — Handle Missing Values

Step 3 — Outlier Removal (IQR Method)

🛠️ Libraries Used

🚀 How to Run

📁 Project Structure

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊 Students Performance — EDA & Data Preprocessing

📋 Table of Contents

📌 Project Overview

📂 Dataset

Sample Data:

⚙️ Preprocessing Steps

Step 1 — Variance Threshold Feature Selection

Step 2 — Handle Missing Values

Step 3 — Outlier Removal (IQR Method)

🛠️ Libraries Used

🚀 How to Run

📁 Project Structure

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages