This project focuses on analyzing large-scale datasets using Apache Spark. The objective was to understand how distributed data processing works and to perform transformations and aggregations efficiently using Spark.
- Apache Spark
- PySpark
- Python
- Big Data Processing Concepts
- Loaded large datasets using Spark
- Applied transformations such as filtering, grouping, and aggregations
- Used Spark DataFrames for analysis
- Optimized basic workflows for performance understanding
- Practical understanding of distributed data processing
- Working with Spark DataFrames and transformations
- Handling large datasets beyond single-machine processing
This project is part of my learning journey in Big Data and Spark.