A reproducible data‐engineering workflow that ingests, harmonizes, and merges
• bilateral trade flows (UN Comtrade)
• MFN tariff schedules (WITS)
• HS6 price‐elasticities (Kee et al.)
…into one joined dataset ready for economic modelling, dashboarding, or Foundry lineage.
- Idempotent ingestion: only re‐fetches Comtrade or WITS if no local file exists
- Multi‐source support: UN Comtrade SDMX, WITS SDMX & offline ZIPs, Kee‑Nicita‑Olarreaga CSV
- Data harmonization: aligns reporter ISO‑3, partner groups, HS2022 codes, calendar years
- Rapid QA: built‐in scripts for basic completeness checks and output summaries
- One‑line merge: joins flows × tariffs × elasticities for downstream analysis
.
├── data/
│ ├── raw/ # downloaded Comtrade & WITS ZIPs
│ └── processed/
│ ├── trade_flows.csv # UN Comtrade flows
│ ├── wits_mfn_h6_offline.csv
│ └── flows_with_mfn.csv # final joined output
├── scripts/
│ ├── fetch_trade.py # Comtrade SDMX fetch + cache
│ ├── wits_fetch.py # WITS online fetch (SDMX)
│ ├── ingest_wits.py # offline ZIP unzip + assemble
│ └── merge.py # combine flows, tariffs, elasticities
├── utils/
│ └── iso_lookup.csv # numeric ISO → ISO‑3
├── requirements.txt
├── README.md
└── LICENSE
-
Install dependencies
python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt
-
Fetch & preprocess data
# only downloads if missing python scripts/fetch_trade.py python scripts/wits_fetch.py # or: python scripts/ingest_wits.py
-
Merge everything
python scripts/merge.py # outputs: data/processed/flows_with_mfn.csv -
Quick QA
python - <<'PY' import pandas as pd df = pd.read_csv('data/processed/flows_with_mfn.csv') print(df.head(), df.isna().sum().to_dict()) PY
- Add new data sources under
scripts/ - Improve QA in
scripts/qc_*.py - File issues or PRs for missing HS concordances
Distributed under MIT. See LICENSE for details.
Title
Trade Policy Data Pipeline
Trade policy analysts and economists routinely need to combine disparate international datasets—bilateral trade flows, national tariff schedules, and product‐level demand elasticities—into a single, model‐ready table. Manually wrangling these feeds is error‑prone, time‑consuming, and non‑reproducible.
I built a fully automated, idempotent pipeline in Python (and easily ported to Foundry/AIP) that:
- Fetches trade flows from UN Comtrade’s SDMX API only when missing
- Ingests MFN tariff schedules via the WITS SDMX endpoint or offline ZIP extracts
- Loads Kee et al. HS‑6 elasticities from a prepared CSV
- Harmonizes key identifiers (ISO‑3, HS‑2022, calendar years)
- Joins everything into one flat CSV ready for gravity models, dashboard charts, or policy simulations
Every step checks for existing files—so rerunning the workflow never re‑downloads unchanged data, ensuring both speed and auditability.
- Economists & policy analysts get a turnkey dataset, saving hours of scripting and QA
- Foundry teams can operationalize this as a chained Transform pipeline with full lineage
- Decision‑makers can pull up‐to‑date trade & tariff tables to model the effect of a tariff change or FTA shock in real time
By abstracting away the messy details—SDMX quirks, ZIP extraction, code‐list concordances—this tool lets users focus on analysis rather than plumbing. In a live Foundry demo, a 3‑click workflow can refresh all inputs, recompute aggregates (e.g. HS2‐level averages), and visualize trade‐weighted tariff changes on a world map.
- Python with
requests,pandas,tqdm - SDMX APIs (Comtrade + WITS) + offline ZIP support
- (Future) port transform logic to Foundry AIP for modular, GUI‑driven pipelines
- (Future) schedule daily refresh tasks and live lineage in Palantir
This demo shows not just what I built, but how I’d operationalize it in a mission‑critical environment: reproducible, efficient, and audit‑ready.
That’s my “build”: a foundational data‑engineering layer for trade‑policy insights powered by Palantir technology.