This project demonstrates building a simple ETL pipeline using AWS cloud services. The garments dataset was uploaded to Amazon S3, cataloged using AWS Glue, and queried using Amazon Athena. Data visualization was created using Amazon QuickSight.
- Amazon S3 – Data storage
- AWS IAM – Role and permission management
- AWS Glue – Data catalog and crawler
- Amazon Athena – SQL query execution
- Amazon QuickSight – Data visualization dashboard
- Uploaded CSV dataset to S3 bucket.
- Created IAM role with required permissions.
- Configured Glue Crawler to scan S3 data.
- Generated table in AWS Data Catalog.
- Queried data using Athena.
- Connected Athena to QuickSight for dashboard creation.
SELECT *
FROM "AwsDataCatalog"."garments"."data"
LIMIT 5;SELECT category, SUM(sales) AS total_sales
FROM "AwsDataCatalog"."garments"."data"
GROUP BY category;SELECT category, SUM(sales) AS total_sales
FROM "AwsDataCatalog"."garments"."data"
GROUP BY category
HAVING SUM(sales) > 5000;CSV File → Amazon S3 → AWS Glue Crawler → Data Catalog → Amazon Athena → QuickSight Dashboard
- Built end-to-end AWS ETL pipeline
- Successfully queried cloud data using SQL
- Created interactive dashboard in QuickSight
AWS | ETL | SQL | Data Engineering | Cloud Analytics | Business Intelligence