|
| 1 | +--- |
| 2 | +title: DataNode & the graph |
| 3 | +description: Everything is a DataNode with typed ports. Declare inputs and outputs; the dependency graph builds itself. |
| 4 | +--- |
| 5 | + |
| 6 | +# DataNode & the graph |
| 7 | + |
| 8 | +The core insight: **a model, a rule, an ETL job, and an alert queue are the same |
| 9 | +shape.** Each consumes some things and produces others. So they're all one type — |
| 10 | +`DataNode` — and the dependency graph falls out of matching what they produce to |
| 11 | +what others consume. |
| 12 | + |
| 13 | +## A node is what it reads and writes |
| 14 | + |
| 15 | +```python |
| 16 | +from model_ledger import DataNode |
| 17 | + |
| 18 | +DataNode( |
| 19 | + name="fraud_scorer", |
| 20 | + platform="ml", |
| 21 | + inputs=["customer_features"], # what it consumes |
| 22 | + outputs=["risk_scores"], # what it produces |
| 23 | + metadata={"framework": "xgboost", "owner": "risk-team"}, |
| 24 | +) |
| 25 | +``` |
| 26 | + |
| 27 | +`inputs` and `outputs` are **ports** — the names of the data flowing in and out. A |
| 28 | +plain string becomes a [`DataPort`](#dataport-precision) automatically. |
| 29 | + |
| 30 | +## The graph builds itself |
| 31 | + |
| 32 | +You never draw edges. You call `connect()`, and every place an output port name |
| 33 | +matches an input port name becomes a dependency: |
| 34 | + |
| 35 | +```python |
| 36 | +from model_ledger import Ledger, DataNode |
| 37 | + |
| 38 | +ledger = Ledger() |
| 39 | +ledger.add([ |
| 40 | + DataNode("segmentation", platform="etl", outputs=["customer_segments"]), |
| 41 | + DataNode("fraud_scorer", platform="ml", inputs=["customer_segments"], outputs=["risk_scores"]), |
| 42 | + DataNode("fraud_alerts", platform="alerting", inputs=["risk_scores"]), |
| 43 | +]) |
| 44 | +ledger.connect() |
| 45 | + |
| 46 | +ledger.trace("fraud_alerts") # ['segmentation', 'fraud_scorer', 'fraud_alerts'] |
| 47 | +ledger.upstream("fraud_alerts") # everything that feeds it |
| 48 | +ledger.downstream("segmentation")# everything that depends on it |
| 49 | +``` |
| 50 | + |
| 51 | +```mermaid |
| 52 | +graph LR |
| 53 | + A["segmentation"] -->|customer_segments| B["fraud_scorer"] -->|risk_scores| C["fraud_alerts"] |
| 54 | + classDef n fill:#efe8da,stroke:#7a1a1a,color:#1c1a17; |
| 55 | + class A,B,C n; |
| 56 | +``` |
| 57 | + |
| 58 | +This is why discovery scales: a connector just emits `DataNode`s with their ports, |
| 59 | +and the cross-platform graph assembles itself — an ETL job in your warehouse links to |
| 60 | +a model in MLflow links to a queue in your alerting system, with no shared ID scheme. |
| 61 | + |
| 62 | +## DataPort precision |
| 63 | + |
| 64 | +When two models legitimately write a table with the same name, a bare port name would |
| 65 | +collide. `DataPort` carries optional schema to disambiguate — edges only form when the |
| 66 | +schema matches too: |
| 67 | + |
| 68 | +```python |
| 69 | +from model_ledger import DataNode, DataPort |
| 70 | + |
| 71 | +DataNode("check_rules", outputs=[DataPort("alerts", model_name="checks")]) |
| 72 | +DataNode("card_rules", outputs=[DataPort("alerts", model_name="cards")]) |
| 73 | +DataNode("check_queue", inputs=[DataPort("alerts", model_name="checks")]) |
| 74 | +# check_queue connects to check_rules only — model_name must match. |
| 75 | +``` |
| 76 | + |
| 77 | +Port matching is case-insensitive, and schema values support `%` wildcards. |
| 78 | + |
| 79 | +## From node to governed model |
| 80 | + |
| 81 | +A `DataNode` gives you structure. To give a node an **identity and history** — |
| 82 | +owner, risk tier, purpose, and an audit trail — you |
| 83 | +[`register()`](../reference/index.md) it as a [`ModelRef`](snapshot.md) and |
| 84 | +[`record()`](snapshot.md) events against it. Discovery and registration are two views |
| 85 | +of the same inventory: the graph (what connects to what) and the ledger (what each |
| 86 | +thing *is* and how it changed). |
| 87 | + |
| 88 | +[Next: Snapshots & the event log :octicons-arrow-right-24:](snapshot.md) |
0 commit comments