Bidirectional converter between the Beneficial Ownership Data Standard (BODS) v0.4 and Neo4j graph database, with built-in graph analysis queries for UBO detection, corporate group mapping, and circular ownership detection.
Part of the BODS Interoperability Toolkit.
BODS provides a universal, standardised format for beneficial ownership data — enabling interoperability across countries, registers, and data sources. Neo4j provides powerful graph traversal and analysis capabilities that are essential for understanding complex ownership structures.
This tool bridges the gap:
- BODS → Neo4j: Import any BODS v0.4 dataset into Neo4j for graph analysis, UBO detection, and visualisation
- Neo4j → BODS: Export graph data back to BODS format for standardised data exchange and sharing
- Graph Queries: Ready-made Cypher queries for common beneficial ownership analysis tasks
The Beneficial Ownership Data Standard was created by Open Ownership as the world's leading open standard for beneficial ownership information on legal persons and legal arrangements.
BODS data is published by multiple sources including:
- BODS Data Explorer — GLEIF and other datasets in BODS format
- UK PSC Pipeline — UK Companies House data
- GLEIF Pipeline — Legal Entity Identifiers
- OpenCorporates Pipeline — OpenCorporates relationship data
- Kyckr Pipeline — Kyckr UBO verification data
- ICIJ Offshore Leaks Pipeline — Offshore Leaks Database
pip install -e .For development:
pip install -e ".[dev]"docker compose up -dThis starts Neo4j Community Edition on bolt://localhost:7687 with APOC plugin.
Generate Neo4j-importable CSV files and import scripts:
bods-neo4j to-csv examples/sample_data/sample_bods.json -o ./neo4j_exportThis produces:
- One CSV per node label (
entity.csv,person.csv,identifier.csv,address.csv,country.csv,unspecified_party.csv) - One CSV per relationship type (
owns.csv,controls.csv,manages.csv,is_party_to.csv,has_other_interest.csv,has_identifier.csv,has_address.csv,located_in.csv,registered_in.csv,born_in.csv) import.cypher—LOAD CSVscript for a running Neo4j instance (includes constraints + entity subtype-label fixups)import.sh— convenience wrapper that pipesimport.cypherthroughcypher-shell
Load BODS data directly into Neo4j:
bods-neo4j to-neo4j examples/sample_data/sample_bods.json \
--uri bolt://localhost:7687 \
--username neo4j \
--password bodspasswordExport the graph back to BODS format:
bods-neo4j to-bods output.jsonl \
--uri bolt://localhost:7687 \
--username neo4j \
--password bodspassword \
--publisher-name "My Organisation"# BODS file statistics
bods-neo4j info examples/sample_data/sample_bods.json
# Neo4j graph statistics
bods-neo4j graph-info --uri bolt://localhost:7687 --username neo4j --password bodspasswordThe model is graph-native: things BODS treats as identities (identifiers, addresses, jurisdictions) are first-class nodes, and the generic interest edge splits into a 5-family taxonomy. Each BODS interest in a relationship statement becomes one typed edge carrying the interest payload directly — no intermediate :Interest node.
| Label | BODS source | Key | Notes |
|---|---|---|---|
:Entity (+ subtype labels: RegisteredEntity / LegalEntity / Arrangement / Trust / Nomination / StateBody / …) |
entity statement | recordId |
Inline statement metadata (statementId, publicationDate, publisherName, source*, annotation*). |
:Person |
person statement | recordId |
Nationalities / tax residencies as inline parallel-list properties (nationalityCodes, nationalityNames). |
:Identifier |
identifiers[] (entity / person) |
uid = scheme|id |
Deduplicated globally — the same Companies-House ID is one node across all statements that reference it. |
:Address |
addresses[] (entity / person) |
uid = sha256(normalised content) |
The BODS address type lives on the [:HAS_ADDRESS {type, ordinal}] edge so the same address can play multiple roles. |
:Country |
entity jurisdiction + address country leg | code (ISO 3166-1 alpha-2) |
Shared dedup point. Nationality / tax residency stay inline on :Person. |
:UnspecifiedParty |
inline unspecifiedReason party object (subject or interestedParty) |
uid = statementId:side |
Sentinel — preserves BODS's inline-unspecified shape when there is no recordId to anchor the typed edge to. |
(party:Person|Entity)-[:OWNS|CONTROLS|MANAGES|IS_PARTY_TO|HAS_OTHER_INTEREST {…interest payload + statement metadata}]->(subject:Entity|Person)
(:Entity|:Person)-[:HAS_IDENTIFIER {ordinal, isPrimary}]->(:Identifier)
(:Entity|:Person)-[:HAS_ADDRESS {type, ordinal}]->(:Address)-[:LOCATED_IN]->(:Country)
(:Entity)-[:REGISTERED_IN]->(:Country)
(:Person)-[:BORN_IN]->(:Address)
(:Entity|:Person)-[:REPLACES]->(same-label)
A BODS relationship statement with N interests in its interests[] array becomes N parallel typed edges between the same (party, subject) pair. Every edge carries the statement-level metadata (statementId, recordId, publicationDate, publisherName, sourceTypes, …) duplicated, plus the per-interest payload (bodsInterestType, family, shareExact/Min/Max, directOrIndirect, beneficialOwnershipOrControl, startDate, endDate, details). The reverse mapper groups edges by statementId and sorts by interestIndex to rebuild a single BODS statement.
The 23 BODS interest types collapse into five Cypher relationship-type families for fast traversal selectivity; the original interest type is preserved verbatim as bodsInterestType on each edge so the round-trip is lossless.
| Family | BODS interest types covered |
|---|---|
OWNS |
shareholding, rightsToProfitOrIncome, rightsToSurplusAssetsOnDissolution, rightToProfitOrIncomeFromAssets, enjoymentAndUseOfAssets, rightsGrantedByContract, conditionalRightsGrantedByContract |
CONTROLS |
votingRights, controlViaCompanyRulesOrArticles, controlByLegalFramework, otherInfluenceOrControl, appointmentOfBoard |
MANAGES |
seniorManagingOfficial, boardMember, boardChair |
IS_PARTY_TO |
settlor, trustee, protector, beneficiaryOfLegalArrangement, nominee, nominator |
HAS_OTHER_INTEREST |
unknownInterest, unpublishedInterest, plus any forward-compat values |
The reverse mapper rebuilds identifiers / addresses by aggregating their related graph nodes, and rebuilds interests[] by grouping typed edges that share a statementId and sorting by interestIndex. No *_json blobs on nodes or edges. Any publisher-specific or forward-compat fields the structured extraction does not recognise are preserved verbatim in a single extrasJson catch-all per node / first edge. Data can be converted back to valid BODS v0.4 format without loss.
from bods_neo4j.config import Neo4jConfig
from bods_neo4j.graph_queries.ubo_detection import find_owners, find_all_ubos
from bods_neo4j.graph_queries.corporate_groups import find_corporate_group, find_top_level_parents
from bods_neo4j.graph_queries.circular_ownership import find_circular_ownership
config = Neo4jConfig(uri="bolt://localhost:7687", username="neo4j", password="bodspassword")
# Find all owners of an entity
owners = find_owners("rec-entity-alpha", config)
# Find all UBOs with >= 25% effective ownership
ubos = find_all_ubos(config, threshold=25.0)
# Map a corporate group
group = find_corporate_group("rec-entity-alpha", config)
# Find top-level parent entities
parents = find_top_level_parents(config, limit=50)
# Detect circular ownership
cycles = find_circular_ownership(config)With BODS interests carried directly on the typed edge, ownership chains are single-hop variable-length traversals.
Find all owners of an entity:
MATCH path = (owner)-[:OWNS|CONTROLS*1..10]->(target:Entity {recordId: "rec-entity-alpha"})
WHERE owner:Person OR
(owner:Entity AND NOT EXISTS { ()-[:OWNS|CONTROLS]->(owner) })
RETURN owner.name, length(path) AS depth,
[r IN relationships(path) | r.bodsInterestType] AS interestTypesCalculate effective ownership through chains:
MATCH path = (person:Person)-[:OWNS*1..10]->(entity:Entity)
WITH person, entity, path,
reduce(pct = 1.0, r IN relationships(path) |
CASE WHEN r.shareMinimum IS NOT NULL
THEN pct * (toFloat(r.shareMinimum) / 100.0)
ELSE pct END) * 100.0 AS effectivePct
WHERE effectivePct >= 25
RETURN person.name, entity.name, effectivePctDetect circular ownership:
MATCH path = (e:Entity)-[:OWNS|CONTROLS*2..10]->(e)
RETURN e.name, length(path) AS cycleLength,
[n IN nodes(path) | n.name] AS cycleNamesFind entities without identified UBOs:
MATCH (e:Entity)
WHERE EXISTS { ()-[:OWNS|CONTROLS]->(e) }
AND NOT EXISTS { (:Person)-[:OWNS|CONTROLS*1..20]->(e) }
RETURN e.name, head([(e)-[:REGISTERED_IN]->(c:Country) | c.code]) AS jurisdictionCodeThis tool was designed by comparing two approaches to modelling UK beneficial ownership data:
| Approach | Open Ownership (bods-uk-psc-pipeline) | Neo4j Team (neo4j-company-house-demo) |
|---|---|---|
| Output | Standardised BODS v0.4 JSON | Custom Neo4j property graph |
| Strength | Interoperability across data sources | Graph traversal and UBO analysis |
| Weakness | No graph analysis built in | Source-specific, not interoperable |
BODS-Neo4j bridges this gap — standardised data format with graph analysis capabilities.
pytestTests cover:
- BODS file reading (JSON and JSONL)
- Statement mapping (entities, persons, relationships)
- CSV export with correct structure
- Round-trip fidelity (BODS → Neo4j → BODS)
- BODS schema utilities
tests/test_bods_fixtures_conformance.py runs the mapper against every case in the canonical bods-v04-fixtures pack via the pytest-bods-v04-fixtures plugin. Tests are parametrized by fixture name so a failure like [edge-cases/10-circular-ownership] points straight at the offending case.
Graph-specific conformance checks include: every statement maps to a node or relationship (no silent None returns from shape divergence); circular ownership emits two distinct mirrored HAS_INTEREST edges; and declared-unknown UBOs (inline unspecifiedReason) don't crash the mapper and still leave the known subject entity as a usable node.
src/bods_neo4j/
├── cli.py # Click CLI commands
├── config.py # Configuration dataclasses
├── bods_to_neo4j/
│ ├── reader.py # BODS JSON/JSONL streaming reader
│ ├── mapper.py # BODS statements → Neo4j nodes/relationships
│ ├── csv_exporter.py # CSV export + Cypher/admin import scripts
│ └── driver_loader.py # Direct Neo4j loading via Python driver
├── neo4j_to_bods/
│ ├── extractor.py # Query Neo4j graph
│ ├── mapper.py # Neo4j nodes/rels → BODS statements
│ └── writer.py # Output BODS JSON/JSONL
├── graph_queries/
│ ├── ubo_detection.py # Ultimate beneficial owner traversal
│ ├── corporate_groups.py # Corporate group mapping and metrics
│ └── circular_ownership.py # Cycle detection
└── utils/
├── bods_schema.py # BODS v0.4 constants and helpers
└── neo4j_helpers.py # Neo4j connection and batch operations
MIT