Skip to content

StephenAbbott/bods-neo4j

Repository files navigation

BODS-Neo4j

Bidirectional converter between the Beneficial Ownership Data Standard (BODS) v0.4 and Neo4j graph database, with built-in graph analysis queries for UBO detection, corporate group mapping, and circular ownership detection.

Part of the BODS Interoperability Toolkit.

Why?

BODS provides a universal, standardised format for beneficial ownership data — enabling interoperability across countries, registers, and data sources. Neo4j provides powerful graph traversal and analysis capabilities that are essential for understanding complex ownership structures.

This tool bridges the gap:

  • BODS → Neo4j: Import any BODS v0.4 dataset into Neo4j for graph analysis, UBO detection, and visualisation
  • Neo4j → BODS: Export graph data back to BODS format for standardised data exchange and sharing
  • Graph Queries: Ready-made Cypher queries for common beneficial ownership analysis tasks

Background

The Beneficial Ownership Data Standard was created by Open Ownership as the world's leading open standard for beneficial ownership information on legal persons and legal arrangements.

BODS data is published by multiple sources including:

Installation

pip install -e .

For development:

pip install -e ".[dev]"

Quick Start

1. Start Neo4j

docker compose up -d

This starts Neo4j Community Edition on bolt://localhost:7687 with APOC plugin.

2. BODS → Neo4j (CSV export)

Generate Neo4j-importable CSV files and import scripts:

bods-neo4j to-csv examples/sample_data/sample_bods.json -o ./neo4j_export

This produces:

  • One CSV per node label (entity.csv, person.csv, identifier.csv, address.csv, country.csv, unspecified_party.csv)
  • One CSV per relationship type (owns.csv, controls.csv, manages.csv, is_party_to.csv, has_other_interest.csv, has_identifier.csv, has_address.csv, located_in.csv, registered_in.csv, born_in.csv)
  • import.cypherLOAD CSV script for a running Neo4j instance (includes constraints + entity subtype-label fixups)
  • import.sh — convenience wrapper that pipes import.cypher through cypher-shell

3. BODS → Neo4j (direct driver load)

Load BODS data directly into Neo4j:

bods-neo4j to-neo4j examples/sample_data/sample_bods.json \
    --uri bolt://localhost:7687 \
    --username neo4j \
    --password bodspassword

4. Neo4j → BODS

Export the graph back to BODS format:

bods-neo4j to-bods output.jsonl \
    --uri bolt://localhost:7687 \
    --username neo4j \
    --password bodspassword \
    --publisher-name "My Organisation"

5. Inspect data

# BODS file statistics
bods-neo4j info examples/sample_data/sample_bods.json

# Neo4j graph statistics
bods-neo4j graph-info --uri bolt://localhost:7687 --username neo4j --password bodspassword

Neo4j Graph Schema

The model is graph-native: things BODS treats as identities (identifiers, addresses, jurisdictions) are first-class nodes, and the generic interest edge splits into a 5-family taxonomy. Each BODS interest in a relationship statement becomes one typed edge carrying the interest payload directly — no intermediate :Interest node.

Nodes

Label BODS source Key Notes
:Entity (+ subtype labels: RegisteredEntity / LegalEntity / Arrangement / Trust / Nomination / StateBody / …) entity statement recordId Inline statement metadata (statementId, publicationDate, publisherName, source*, annotation*).
:Person person statement recordId Nationalities / tax residencies as inline parallel-list properties (nationalityCodes, nationalityNames).
:Identifier identifiers[] (entity / person) uid = scheme|id Deduplicated globally — the same Companies-House ID is one node across all statements that reference it.
:Address addresses[] (entity / person) uid = sha256(normalised content) The BODS address type lives on the [:HAS_ADDRESS {type, ordinal}] edge so the same address can play multiple roles.
:Country entity jurisdiction + address country leg code (ISO 3166-1 alpha-2) Shared dedup point. Nationality / tax residency stay inline on :Person.
:UnspecifiedParty inline unspecifiedReason party object (subject or interestedParty) uid = statementId:side Sentinel — preserves BODS's inline-unspecified shape when there is no recordId to anchor the typed edge to.

Relationships

(party:Person|Entity)-[:OWNS|CONTROLS|MANAGES|IS_PARTY_TO|HAS_OTHER_INTEREST {…interest payload + statement metadata}]->(subject:Entity|Person)
(:Entity|:Person)-[:HAS_IDENTIFIER {ordinal, isPrimary}]->(:Identifier)
(:Entity|:Person)-[:HAS_ADDRESS {type, ordinal}]->(:Address)-[:LOCATED_IN]->(:Country)
(:Entity)-[:REGISTERED_IN]->(:Country)
(:Person)-[:BORN_IN]->(:Address)
(:Entity|:Person)-[:REPLACES]->(same-label)

A BODS relationship statement with N interests in its interests[] array becomes N parallel typed edges between the same (party, subject) pair. Every edge carries the statement-level metadata (statementId, recordId, publicationDate, publisherName, sourceTypes, …) duplicated, plus the per-interest payload (bodsInterestType, family, shareExact/Min/Max, directOrIndirect, beneficialOwnershipOrControl, startDate, endDate, details). The reverse mapper groups edges by statementId and sorts by interestIndex to rebuild a single BODS statement.

The 23 BODS interest types collapse into five Cypher relationship-type families for fast traversal selectivity; the original interest type is preserved verbatim as bodsInterestType on each edge so the round-trip is lossless.

Family BODS interest types covered
OWNS shareholding, rightsToProfitOrIncome, rightsToSurplusAssetsOnDissolution, rightToProfitOrIncomeFromAssets, enjoymentAndUseOfAssets, rightsGrantedByContract, conditionalRightsGrantedByContract
CONTROLS votingRights, controlViaCompanyRulesOrArticles, controlByLegalFramework, otherInfluenceOrControl, appointmentOfBoard
MANAGES seniorManagingOfficial, boardMember, boardChair
IS_PARTY_TO settlor, trustee, protector, beneficiaryOfLegalArrangement, nominee, nominator
HAS_OTHER_INTEREST unknownInterest, unpublishedInterest, plus any forward-compat values

Round-Trip Fidelity

The reverse mapper rebuilds identifiers / addresses by aggregating their related graph nodes, and rebuilds interests[] by grouping typed edges that share a statementId and sorting by interestIndex. No *_json blobs on nodes or edges. Any publisher-specific or forward-compat fields the structured extraction does not recognise are preserved verbatim in a single extrasJson catch-all per node / first edge. Data can be converted back to valid BODS v0.4 format without loss.

Graph Analysis Queries

Python API

from bods_neo4j.config import Neo4jConfig
from bods_neo4j.graph_queries.ubo_detection import find_owners, find_all_ubos
from bods_neo4j.graph_queries.corporate_groups import find_corporate_group, find_top_level_parents
from bods_neo4j.graph_queries.circular_ownership import find_circular_ownership

config = Neo4jConfig(uri="bolt://localhost:7687", username="neo4j", password="bodspassword")

# Find all owners of an entity
owners = find_owners("rec-entity-alpha", config)

# Find all UBOs with >= 25% effective ownership
ubos = find_all_ubos(config, threshold=25.0)

# Map a corporate group
group = find_corporate_group("rec-entity-alpha", config)

# Find top-level parent entities
parents = find_top_level_parents(config, limit=50)

# Detect circular ownership
cycles = find_circular_ownership(config)

Key Cypher Queries

With BODS interests carried directly on the typed edge, ownership chains are single-hop variable-length traversals.

Find all owners of an entity:

MATCH path = (owner)-[:OWNS|CONTROLS*1..10]->(target:Entity {recordId: "rec-entity-alpha"})
WHERE owner:Person OR
      (owner:Entity AND NOT EXISTS { ()-[:OWNS|CONTROLS]->(owner) })
RETURN owner.name, length(path) AS depth,
       [r IN relationships(path) | r.bodsInterestType] AS interestTypes

Calculate effective ownership through chains:

MATCH path = (person:Person)-[:OWNS*1..10]->(entity:Entity)
WITH person, entity, path,
     reduce(pct = 1.0, r IN relationships(path) |
         CASE WHEN r.shareMinimum IS NOT NULL
              THEN pct * (toFloat(r.shareMinimum) / 100.0)
              ELSE pct END) * 100.0 AS effectivePct
WHERE effectivePct >= 25
RETURN person.name, entity.name, effectivePct

Detect circular ownership:

MATCH path = (e:Entity)-[:OWNS|CONTROLS*2..10]->(e)
RETURN e.name, length(path) AS cycleLength,
       [n IN nodes(path) | n.name] AS cycleNames

Find entities without identified UBOs:

MATCH (e:Entity)
WHERE EXISTS { ()-[:OWNS|CONTROLS]->(e) }
  AND NOT EXISTS { (:Person)-[:OWNS|CONTROLS*1..20]->(e) }
RETURN e.name, head([(e)-[:REGISTERED_IN]->(c:Country) | c.code]) AS jurisdictionCode

Comparison with Other Approaches

This tool was designed by comparing two approaches to modelling UK beneficial ownership data:

Approach Open Ownership (bods-uk-psc-pipeline) Neo4j Team (neo4j-company-house-demo)
Output Standardised BODS v0.4 JSON Custom Neo4j property graph
Strength Interoperability across data sources Graph traversal and UBO analysis
Weakness No graph analysis built in Source-specific, not interoperable

BODS-Neo4j bridges this gap — standardised data format with graph analysis capabilities.

Testing

pytest

Tests cover:

  • BODS file reading (JSON and JSONL)
  • Statement mapping (entities, persons, relationships)
  • CSV export with correct structure
  • Round-trip fidelity (BODS → Neo4j → BODS)
  • BODS schema utilities

Conformance against the shared BODS v0.4 fixture pack

tests/test_bods_fixtures_conformance.py runs the mapper against every case in the canonical bods-v04-fixtures pack via the pytest-bods-v04-fixtures plugin. Tests are parametrized by fixture name so a failure like [edge-cases/10-circular-ownership] points straight at the offending case.

Graph-specific conformance checks include: every statement maps to a node or relationship (no silent None returns from shape divergence); circular ownership emits two distinct mirrored HAS_INTEREST edges; and declared-unknown UBOs (inline unspecifiedReason) don't crash the mapper and still leave the known subject entity as a usable node.

Project Structure

src/bods_neo4j/
├── cli.py                          # Click CLI commands
├── config.py                       # Configuration dataclasses
├── bods_to_neo4j/
│   ├── reader.py                   # BODS JSON/JSONL streaming reader
│   ├── mapper.py                   # BODS statements → Neo4j nodes/relationships
│   ├── csv_exporter.py             # CSV export + Cypher/admin import scripts
│   └── driver_loader.py            # Direct Neo4j loading via Python driver
├── neo4j_to_bods/
│   ├── extractor.py                # Query Neo4j graph
│   ├── mapper.py                   # Neo4j nodes/rels → BODS statements
│   └── writer.py                   # Output BODS JSON/JSONL
├── graph_queries/
│   ├── ubo_detection.py            # Ultimate beneficial owner traversal
│   ├── corporate_groups.py         # Corporate group mapping and metrics
│   └── circular_ownership.py       # Cycle detection
└── utils/
    ├── bods_schema.py              # BODS v0.4 constants and helpers
    └── neo4j_helpers.py            # Neo4j connection and batch operations

License

MIT

About

Bidirectional converter between BODS v0.4 (Beneficial Ownership Data Standard) and Neo4j graph database, with UBO detection, corporate group mapping, and circular ownership analysis

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages