Skip to content

Prahaladsingh221/Task1-data-cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Task 1: Data Cleaning and Preprocessing

Internship Task – Data Analyst Role

This repository contains the solution to Task 1 of the Data Analyst Internship, focusing on data cleaning and preprocessing using Python (Pandas) in Google Colab.


Objective

Clean and prepare a raw dataset by:

  • Identifying and handling missing values
  • Removing duplicates
  • Standardizing column names and text data
  • Ensuring consistent data types and formats

Dataset

Dataset Name: Mall Customer Segmentation Data
Source: Provided during the internship task
File: Mall_Customers.csv


Cleaning Summary

Step Description
Missing Values Check No missing values found
Duplicate Check No duplicate rows present
Column Renaming Standardized to lowercase with underscores
Text Standardization Gender values standardized to title case
Data Type Check All data types confirmed appropriate

Files in this Repository

File Description
task1_data_cleaning.ipynb Google Colab notebook with the complete cleaning process
Mall_Customers.csv Original dataset
cleaned_mall_customers.csv Cleaned and processed version of the dataset
README.md Summary and documentation of the task

Tools & Technologies

  • Python 3
  • Pandas
  • Google Colab
  • GitHub

Key Learnings

  • Hands-on experience with Pandas for cleaning real-world datasets
  • Techniques to detect and handle common data issues
  • Understanding importance of standardization and preprocessing before analysis

Task Completed

This task is submitted as part of the internship program.
To view the solution notebook or the cleaned dataset, explore the files above.


About

Data cleaning and preprocessing task using Pandas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors