A free 1,000-row sample of the Valan Technologies global procurement intelligence feed: point-in-time, entity-resolved government contract awards linked to the awarded company's tradable security (ticker / LEI / ISIN).
Valan Technologies maintains one of the most comprehensive privately-held public procurement datasets in existence: 100M+ contract awards and tenders from 150+ government sources across 222 countries and territories, in 8 languages, updated daily, entity-resolved against a 72.4-million-row global company master.
This repository is the public evaluation slice. Feed snapshot: 2026-05-30.
| 📖 Full data dictionary | DATA_DICTIONARY.md · valan.io/data-dictionary |
| 🧠 Machine-readable corpus | valan.io/llms-full.txt |
| 📚 Glossary / FAQ | valan.io/glossary |
| 📰 Research built on this data | valan.io/research |
| 💼 Full feed access | john@valan.io |
| File | Notes |
|---|---|
valan_sample_1k_fin_awards.parquet / .csv |
1,000-row US-heavy investable slice. Identical content in both formats. |
valan_sample_loader.py |
Reference loader. Applies PIT and compliance rules; keeps PIT and current strictly separate. |
DATA_DICTIONARY.md |
Full column reference for all six feed tables. |
python valan_sample_loader.py valan_sample_1k_fin_awards.parquet
# requires: pandas, pyarrow (duckdb optional)This is a curated investable-only slice (investable_flag = True on every row). In the full universe only ~17% of awards (12.3M of 71.9M) carry a tradable ticker, so do not extrapolate coverage from this file. It is built to show the schema, the identity resolution, and the point-in-time mechanics — not the live hit-rate.
Two distinct tradability concepts; do not conflate them:
ticker_as_ofwithpit_confirmed = True— the ticker as of the award date, sourced from a genuine dated listing window. Forward-bias-free. Use this for backtests.ultimate_parent_ticker(parent rollup) — the supplier's current ownership link (who owns it today), not as-of the award. Useful for screening; look-ahead present.
The reference loader surfaces this split directly rather than hiding it inside the rollup. The same discipline applies across the full feed: a ticker is either genuinely point-in-time or honestly NULL — never a current value dressed as as-of.
- Currency —
award_valueis in the local currency (currencycolumn; this slice spans AUD, BRL, CZK, EUR, HUF, PLN, USD). Never sum across currencies. - Real obligated value —
value_type = 'award', positive value, IDIQ/framework ceilings excluded (value_is_ceiling).
PII-clean (email PII removed, verified zero; business-contact phones retained as published procurement data). No PRC-sourced data (ccgp_* sources and buyer_country = 'CN' excluded). RU/BY excluded on/after 2022-02-24. CUI/radioactive rows excluded from the standard feed. No sanctioned buyer or supplier present in this slice.
The sample is one table. The full feed is six:
| Table | Rows | What it is |
|---|---|---|
fin_awards |
71,888,513 | Financial-modelled awards (12,345,679 investable) |
master_awards |
71,857,265 | Text-rich awards: titles, descriptions, source URLs |
fin_tenders |
24,984,433 | Open solicitations — the forward pipeline |
master_tenders |
24,274,179 | Descriptive tenders |
entity_dim |
312,156 | LEI-bridged company dimension |
subcontract_graph |
3,311,541 | Sub→prime contract linkage with tier depth |
Daily refresh · S3 parquet delivery · SHA256-checksummed manifests · by institutional arrangement: john@valan.io
- Code (
valan_sample_loader.py): MIT. - Sample data: free for evaluation, research, and benchmarking with attribution to Valan Technologies (valan.io). Underlying records originate from publicly available government procurement portals. The full feed is licensed separately. Not investment advice.
© 2026 Valan Technologies Limited, Wicklow, Ireland (CRO 802395).