Skip to content

IrishJohn1973/valan-procurement-sample

Repository files navigation

Valan Procurement Intelligence — Public Sample

A free 1,000-row sample of the Valan Technologies global procurement intelligence feed: point-in-time, entity-resolved government contract awards linked to the awarded company's tradable security (ticker / LEI / ISIN).

Valan Technologies maintains one of the most comprehensive privately-held public procurement datasets in existence: 100M+ contract awards and tenders from 150+ government sources across 222 countries and territories, in 8 languages, updated daily, entity-resolved against a 72.4-million-row global company master.

This repository is the public evaluation slice. Feed snapshot: 2026-05-30.

📖 Full data dictionary DATA_DICTIONARY.md · valan.io/data-dictionary
🧠 Machine-readable corpus valan.io/llms-full.txt
📚 Glossary / FAQ valan.io/glossary
📰 Research built on this data valan.io/research
💼 Full feed access john@valan.io

Files

File Notes
valan_sample_1k_fin_awards.parquet / .csv 1,000-row US-heavy investable slice. Identical content in both formats.
valan_sample_loader.py Reference loader. Applies PIT and compliance rules; keeps PIT and current strictly separate.
DATA_DICTIONARY.md Full column reference for all six feed tables.
python valan_sample_loader.py valan_sample_1k_fin_awards.parquet
# requires: pandas, pyarrow (duckdb optional)

Scope — read first

This is a curated investable-only slice (investable_flag = True on every row). In the full universe only ~17% of awards (12.3M of 71.9M) carry a tradable ticker, so do not extrapolate coverage from this file. It is built to show the schema, the identity resolution, and the point-in-time mechanics — not the live hit-rate.

Point-in-time / forward-bias — read before backtesting

Two distinct tradability concepts; do not conflate them:

  • ticker_as_of with pit_confirmed = True — the ticker as of the award date, sourced from a genuine dated listing window. Forward-bias-free. Use this for backtests.
  • ultimate_parent_ticker (parent rollup) — the supplier's current ownership link (who owns it today), not as-of the award. Useful for screening; look-ahead present.

The reference loader surfaces this split directly rather than hiding it inside the rollup. The same discipline applies across the full feed: a ticker is either genuinely point-in-time or honestly NULL — never a current value dressed as as-of.

Other usage rules

  • Currencyaward_value is in the local currency (currency column; this slice spans AUD, BRL, CZK, EUR, HUF, PLN, USD). Never sum across currencies.
  • Real obligated valuevalue_type = 'award', positive value, IDIQ/framework ceilings excluded (value_is_ceiling).

Compliance (verified at export)

PII-clean (email PII removed, verified zero; business-contact phones retained as published procurement data). No PRC-sourced data (ccgp_* sources and buyer_country = 'CN' excluded). RU/BY excluded on/after 2022-02-24. CUI/radioactive rows excluded from the standard feed. No sanctioned buyer or supplier present in this slice.


The full feed

The sample is one table. The full feed is six:

Table Rows What it is
fin_awards 71,888,513 Financial-modelled awards (12,345,679 investable)
master_awards 71,857,265 Text-rich awards: titles, descriptions, source URLs
fin_tenders 24,984,433 Open solicitations — the forward pipeline
master_tenders 24,274,179 Descriptive tenders
entity_dim 312,156 LEI-bridged company dimension
subcontract_graph 3,311,541 Sub→prime contract linkage with tier depth

Daily refresh · S3 parquet delivery · SHA256-checksummed manifests · by institutional arrangement: john@valan.io


License

  • Code (valan_sample_loader.py): MIT.
  • Sample data: free for evaluation, research, and benchmarking with attribution to Valan Technologies (valan.io). Underlying records originate from publicly available government procurement portals. The full feed is licensed separately. Not investment advice.

© 2026 Valan Technologies Limited, Wicklow, Ireland (CRO 802395).

About

Free 1,000-row sample of the Valan procurement intelligence feed — point-in-time, entity-resolved government contract awards linked to tradable securities (ticker/LEI/ISIN). Full feed: 100M+ records, 222 countries, daily. valan.io

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages