This project is an end-to-end analytics pipeline for a simulated e-commerce business. It generates synthetic sales data, loads it into PostgreSQL, transforms it through a layered dbt project, and makes the final warehouse tables available for reporting in Metabase.
It is a strong starter project for practicing modern analytics engineering workflows with Docker, dbt, SQL modeling, data quality checks, and BI integration.
- Generates realistic e-commerce transaction data with Python and Faker
- Stores raw data as a dbt seed
- Transforms the raw dataset using a Bronze, Silver, and Gold model structure
- Cleans invalid records and removes duplicates in the Silver layer
- Builds analytics-ready dimension and fact tables in the Gold layer
- Runs dbt tests to validate core data quality assumptions
- Connects the warehouse to Metabase for dashboarding and exploration
- Python
- PostgreSQL
- dbt Core
- Docker Compose
- Metabase
The project follows a simple medallion-style design:
- Bronze: raw sales data from the seed file
- Silver: cleaned, standardized, and deduplicated transactions
- Gold: business-facing dimensional model
Gold layer outputs:
dim_customerdim_productfct_order
.
|-- data-source/
| `-- generate-sales.py
|-- dbt_project/
| |-- profiles.example.yml
| `-- dbt_ecomm/
| |-- dbt_project.yml
| |-- models/
| | |-- bronze/
| | |-- silver/
| | `-- gold/
| `-- seeds/
|-- Dockerfile
|-- docker-compose.yml
|-- requirements.txt
`-- README.md
- Python generates synthetic sales transactions.
- The CSV is stored in the dbt
seeds/folder. - dbt loads the seed into PostgreSQL.
- Bronze models expose raw seeded data.
- Silver models fix invalid values, filter bad records, and deduplicate transactions.
- Gold models create analytics-ready fact and dimension tables.
- Metabase connects to PostgreSQL for reporting.
In the Silver layer, the pipeline currently:
- removes rows with missing
customer_id - converts
order_dateto a valid date - filters out future dates
- maps unsupported country codes to
Unknown - replaces non-positive quantities with
1 - replaces non-positive prices with
0 - keeps only the latest row for duplicate
transaction_idvalues
Basic dbt tests are included for:
- unique and non-null
transaction_idin curated models - non-null
customer_id - non-null
product_id - non-null
order_date - non-null
quantity - non-null
price
pip install -r requirements.txtCreate a local environment file if needed:
cp .env.example .envCreate a local dbt profile from the example:
mkdir -p dbt_project/.dbt
cp dbt_project/profiles.example.yml dbt_project/.dbt/profiles.ymldocker compose up --build -dpython data-source/generate-sales.pydocker compose run --rm dbt seed
docker compose run --rm dbt run
docker compose run --rm dbt testVisit:
http://localhost:3000
dbt_project/.dbt/profiles.ymlis intentionally kept out of git because it is machine-specific.- Generated folders like
target/,logs/,.venv/, and Python cache files are ignored. - Sample credentials in
.env.exampleare fine for local development but should be changed before publishing publicly.
- Add source freshness checks and more dbt tests
- Add snapshots for slowly changing dimensions
- Add model documentation and dbt docs site generation
- Add a dashboard screenshot section
- Add CI to run dbt tests automatically on push
Built as an analytics engineering project to demonstrate dbt modeling, PostgreSQL integration, and dashboard-ready warehouse design.