Big Data Analysis using Apache Spark

Overview

This project focuses on analyzing large-scale datasets using Apache Spark. The objective was to understand how distributed data processing works and to perform transformations and aggregations efficiently using Spark.

Tools & Technologies

Apache Spark
PySpark
Python
Big Data Processing Concepts

What I Did

Loaded large datasets using Spark
Applied transformations such as filtering, grouping, and aggregations
Used Spark DataFrames for analysis
Optimized basic workflows for performance understanding

Learning Outcomes

Practical understanding of distributed data processing
Working with Spark DataFrames and transformations
Handling large datasets beyond single-machine processing

Note

This project is part of my learning journey in Big Data and Spark.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
spark-bigdata-analysis.ipynb		spark-bigdata-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Analysis using Apache Spark

Overview

Tools & Technologies

What I Did

Learning Outcomes

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Big Data Analysis using Apache Spark

Overview

Tools & Technologies

What I Did

Learning Outcomes

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages