Skip to content

PiyushGaidhani/Ecommerce_DBT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

E-Commerce Analytics Pipeline with dbt, PostgreSQL, and Metabase

This project is an end-to-end analytics pipeline for a simulated e-commerce business. It generates synthetic sales data, loads it into PostgreSQL, transforms it through a layered dbt project, and makes the final warehouse tables available for reporting in Metabase.

It is a strong starter project for practicing modern analytics engineering workflows with Docker, dbt, SQL modeling, data quality checks, and BI integration.

What This Project Does

  • Generates realistic e-commerce transaction data with Python and Faker
  • Stores raw data as a dbt seed
  • Transforms the raw dataset using a Bronze, Silver, and Gold model structure
  • Cleans invalid records and removes duplicates in the Silver layer
  • Builds analytics-ready dimension and fact tables in the Gold layer
  • Runs dbt tests to validate core data quality assumptions
  • Connects the warehouse to Metabase for dashboarding and exploration

Tech Stack

  • Python
  • PostgreSQL
  • dbt Core
  • Docker Compose
  • Metabase

Architecture

The project follows a simple medallion-style design:

  • Bronze: raw sales data from the seed file
  • Silver: cleaned, standardized, and deduplicated transactions
  • Gold: business-facing dimensional model

Gold layer outputs:

  • dim_customer
  • dim_product
  • fct_order

Project Structure

.
|-- data-source/
|   `-- generate-sales.py
|-- dbt_project/
|   |-- profiles.example.yml
|   `-- dbt_ecomm/
|       |-- dbt_project.yml
|       |-- models/
|       |   |-- bronze/
|       |   |-- silver/
|       |   `-- gold/
|       `-- seeds/
|-- Dockerfile
|-- docker-compose.yml
|-- requirements.txt
`-- README.md

Data Flow

  1. Python generates synthetic sales transactions.
  2. The CSV is stored in the dbt seeds/ folder.
  3. dbt loads the seed into PostgreSQL.
  4. Bronze models expose raw seeded data.
  5. Silver models fix invalid values, filter bad records, and deduplicate transactions.
  6. Gold models create analytics-ready fact and dimension tables.
  7. Metabase connects to PostgreSQL for reporting.

Key Transformations

In the Silver layer, the pipeline currently:

  • removes rows with missing customer_id
  • converts order_date to a valid date
  • filters out future dates
  • maps unsupported country codes to Unknown
  • replaces non-positive quantities with 1
  • replaces non-positive prices with 0
  • keeps only the latest row for duplicate transaction_id values

Data Quality Checks

Basic dbt tests are included for:

  • unique and non-null transaction_id in curated models
  • non-null customer_id
  • non-null product_id
  • non-null order_date
  • non-null quantity
  • non-null price

How To Run

1. Install Python dependencies

pip install -r requirements.txt

2. Prepare local environment files

Create a local environment file if needed:

cp .env.example .env

Create a local dbt profile from the example:

mkdir -p dbt_project/.dbt
cp dbt_project/profiles.example.yml dbt_project/.dbt/profiles.yml

3. Start services

docker compose up --build -d

4. Generate seed data

python data-source/generate-sales.py

5. Run dbt

docker compose run --rm dbt seed
docker compose run --rm dbt run
docker compose run --rm dbt test

6. Open Metabase

Visit:

http://localhost:3000

Notes

  • dbt_project/.dbt/profiles.yml is intentionally kept out of git because it is machine-specific.
  • Generated folders like target/, logs/, .venv/, and Python cache files are ignored.
  • Sample credentials in .env.example are fine for local development but should be changed before publishing publicly.

Future Improvements

  • Add source freshness checks and more dbt tests
  • Add snapshots for slowly changing dimensions
  • Add model documentation and dbt docs site generation
  • Add a dashboard screenshot section
  • Add CI to run dbt tests automatically on push

Author

Built as an analytics engineering project to demonstrate dbt modeling, PostgreSQL integration, and dashboard-ready warehouse design.

About

End-to-end e-commerce analytics pipeline using dbt Core, PostgreSQL, and Metabase — generates synthetic sales data, transforms it through bronze/silver/gold medallion layers, builds a star schema warehouse, and connects to Metabase for dashboarding.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors