From 583462d1241e8900f541d12f21d54e6bbe61786b Mon Sep 17 00:00:00 2001
From: Vignesh Narayanaswamy <Vigneshn@squareup.com>
Date: Sun, 7 Jun 2026 11:09:17 -0700
Subject: [PATCH] docs: add Architecture page + ADRs (the design reasoning)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Senior docs explain how; this adds the *why* — the layer that makes the design
judgment legible.

- concepts/architecture.md: the four load-bearing choices (event-log not
  registry, one-DataNode graph, agents-first/tool-shaped, framework-agnostic
  pluggable core), each with the alternative rejected and the cost accepted on
  purpose — plus an explicit "what model-ledger is NOT" section.
- adr/: five Architecture Decision Records (event-log, DataNode, agents-first,
  framework-agnostic profiles, storage-agnostic backends) with context,
  decision, consequences (+/-), and alternatives considered.
- nav: Architecture under Concepts; Design decisions (the ADRs) under Reference.

OSS-safe: generic reasoning only, no org-specific references.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/adr/0001-event-log-not-a-registry.md |  51 ++++++++
 docs/adr/0002-everything-is-a-datanode.md |  50 ++++++++
 docs/adr/0003-agents-first.md             |  49 ++++++++
 docs/adr/0004-framework-agnostic.md       |  49 ++++++++
 docs/adr/0005-storage-agnostic.md         |  47 ++++++++
 docs/adr/index.md                         |  21 ++++
 docs/concepts/architecture.md             | 134 ++++++++++++++++++++++
 mkdocs.yml                                |   8 ++
 8 files changed, 409 insertions(+)
 create mode 100644 docs/adr/0001-event-log-not-a-registry.md
 create mode 100644 docs/adr/0002-everything-is-a-datanode.md
 create mode 100644 docs/adr/0003-agents-first.md
 create mode 100644 docs/adr/0004-framework-agnostic.md
 create mode 100644 docs/adr/0005-storage-agnostic.md
 create mode 100644 docs/adr/index.md
 create mode 100644 docs/concepts/architecture.md

diff --git a/docs/adr/0001-event-log-not-a-registry.md b/docs/adr/0001-event-log-not-a-registry.md
new file mode 100644
index 0000000..2815f6d
--- /dev/null
+++ b/docs/adr/0001-event-log-not-a-registry.md
@@ -0,0 +1,51 @@
+---
+title: "ADR 0001 — Event log, not a registry"
+description: Model the inventory as an append-only log of immutable snapshots rather than mutable current-state rows.
+---
+
+# ADR 0001 — Model the inventory as an event log, not a registry
+
+**Status:** Accepted
+
+## Context
+
+A model inventory has to answer two kinds of question. Operators ask *"what is the current
+state?"* Auditors and regulators ask *"show me the complete history of every change,
+approval, and validation"* and *"what did the inventory look like on this past date?"*
+
+A conventional registry stores current state and overwrites it on each change. It answers
+the first question well and the second not at all — once a row is updated, the prior state
+is gone, and there is no tamper-evident record that it ever existed.
+
+## Decision
+
+The inventory is an **append-only event log**. A model is a stable identity (`ModelRef`);
+everything that happens to it is an immutable, content-addressed `Snapshot`. Current state
+is a *projection* of the log; point-in-time state (`inventory_at`) is a replay of the log
+up to a timestamp.
+
+Content addressing (each snapshot's hash derives from its content) makes the chain
+tamper-evident: you cannot alter history without the hashes diverging.
+
+## Consequences
+
+**Positive**
+
+- History and point-in-time reconstruction are free — they're inherent to the structure,
+  not a bolted-on audit table that can drift from the real data.
+- The log *is* the audit trail; there is no separate logging system to keep in sync.
+- Tamper-evidence comes from content addressing, which regulated use cases need.
+
+**Negative (accepted)**
+
+- More storage than last-write-wins, and reconstruction is a replay rather than a row read.
+- Callers think in events, not in-place edits — a small conceptual shift.
+
+## Alternatives considered
+
+- **Mutable registry (rejected):** simplest writes, but structurally cannot answer the
+  historical questions that are the entire point for governance.
+- **Registry + a separate audit table (rejected):** two sources of truth that drift; the
+  audit table is exactly the thing an examiner distrusts.
+
+See [Snapshots & the event log](../concepts/snapshot.md) and [Architecture](../concepts/architecture.md).
diff --git a/docs/adr/0002-everything-is-a-datanode.md b/docs/adr/0002-everything-is-a-datanode.md
new file mode 100644
index 0000000..407439f
--- /dev/null
+++ b/docs/adr/0002-everything-is-a-datanode.md
@@ -0,0 +1,50 @@
+---
+title: "ADR 0002 — Everything is a DataNode"
+description: Represent models, rules, ETL, and queues with one typed-port node, and let the dependency graph assemble itself from port matching.
+---
+
+# ADR 0002 — Everything is a DataNode; the graph builds itself
+
+**Status:** Accepted
+
+## Context
+
+A real model estate spans ML models, heuristic rules, ETL jobs, and alert queues, across
+many platforms with no shared identifier scheme. To map dependencies, most tools require
+either a central registry of IDs or per-platform adapters that understand each other.
+Both are brittle and don't scale across platforms.
+
+## Decision
+
+Every entity is a single type — `DataNode` — with typed input and output **ports**. A
+node declares only what it consumes and produces. `connect()` then creates a dependency
+edge wherever an output port name matches an input port name. Connectors emit nodes and
+know nothing about the rest of the graph.
+
+`DataPort` carries optional schema discriminators (e.g. `model_name`) so that two nodes
+writing a same-named table do not falsely link.
+
+## Consequences
+
+**Positive**
+
+- Cross-platform edges (warehouse ETL → MLflow model → alerting queue) form with no shared
+  ID scheme and no inter-connector coupling.
+- Adding a platform is "emit `DataNode`s" — connectors stay dumb and independent, which is
+  what makes discovery scale.
+- One abstraction to learn; rules and ETL are first-class, not second-class to ML models.
+
+**Negative (accepted)**
+
+- Port-name collisions are possible; resolving them precisely requires `DataPort` schema
+  discriminators rather than bare strings.
+- Port naming becomes a modeling concern the connector author must get right.
+
+## Alternatives considered
+
+- **Per-platform model types (rejected):** too rigid; every new platform is a new type and
+  new cross-type wiring.
+- **A fixed, central metadata schema (rejected):** cannot span heterogeneous platforms;
+  forces lossy normalization at discovery time.
+
+See [DataNode & the graph](../concepts/datanode.md).
diff --git a/docs/adr/0003-agents-first.md b/docs/adr/0003-agents-first.md
new file mode 100644
index 0000000..e267ef5
--- /dev/null
+++ b/docs/adr/0003-agents-first.md
@@ -0,0 +1,49 @@
+---
+title: "ADR 0003 — Agents are the primary interface"
+description: Design a small, consolidated, tool-shaped API for agents first; expose it identically over MCP, REST, and the SDK.
+---
+
+# ADR 0003 — Agents are the primary interface; the SDK is tool-shaped
+
+**Status:** Accepted
+
+## Context
+
+Governance questions are conversational by nature — *"which high-risk models changed this
+week and haven't been validated?"* The cheapest way to answer them is to let an agent
+traverse the inventory directly. Most libraries treat an agent/MCP layer as an
+afterthought wrapped around a human-shaped API, which produces awkward, chatty tools.
+
+## Decision
+
+Design the API for the agent first. The SDK is **tool-shaped**: each capability is one
+consolidated verb — `discover`, `record`, `investigate`, `query`, `trace`, `changelog`,
+`tag` — and the same verbs are exposed identically over MCP and REST. Tools follow
+[Anthropic's tool-writing guidance](https://www.anthropic.com/engineering/writing-tools-for-agents):
+few, broad, orthogonal, with agent-readable descriptions and error messages that name the
+next action.
+
+## Consequences
+
+**Positive**
+
+- One mental model across MCP, REST, SDK, and CLI; they can't drift because they share the
+  tool functions.
+- The consolidated surface is easier for a human to learn too — designing for the agent
+  made the SDK cleaner as a side effect.
+- Errors are actionable (they suggest the next call) rather than raising into the agent.
+
+**Negative (accepted)**
+
+- Broad verbs do more per call, which fits fine-grained REST conventions less neatly (no
+  resource-per-endpoint sprawl).
+- A small, opinionated verb set means some niche operations live only in the SDK.
+
+## Alternatives considered
+
+- **Human-first SDK with a thin MCP wrapper (rejected):** yields chatty, leaky tools and
+  two surfaces that drift.
+- **Granular REST endpoints mirrored to many tools (rejected):** overflows an agent's
+  working memory and multiplies the maintenance surface.
+
+See [Agents (MCP)](../guides/agents.md).
diff --git a/docs/adr/0004-framework-agnostic.md b/docs/adr/0004-framework-agnostic.md
new file mode 100644
index 0000000..a1fc224
--- /dev/null
+++ b/docs/adr/0004-framework-agnostic.md
@@ -0,0 +1,49 @@
+---
+title: "ADR 0004 — Framework-agnostic core, pluggable profiles"
+description: Keep regulations out of the core; express them as a pluggable compliance-profile layer over a generic inventory.
+---
+
+# ADR 0004 — Framework-agnostic core; regulations are pluggable profiles
+
+**Status:** Accepted
+
+## Context
+
+model-ledger's demand is driven by regulation (SR 26‑2, EU AI Act Annex IV, NIST AI RMF,
+ISO 42001). The tempting move is to build "an SR 11‑7 tool." But specific regulations get
+renumbered and superseded (SR 11‑7 → SR 26‑2 in 2026), differ by jurisdiction, and would
+narrow a tool that is genuinely general.
+
+## Decision
+
+The core is a generic model inventory with **no regulation baked in**. Specific frameworks
+are expressed as **compliance profiles** — a plugin layer (`sr_11_7`, `eu_ai_act`,
+`nist_ai_rmf`) discovered via entry points, checking a model's completeness against a
+framework's expectations. The documentation leads with the durable capability (complete,
+auditable, point-in-time inventory) and treats named regimes as a thin, current layer.
+
+## Consequences
+
+**Positive**
+
+- A renumbered or new regulation is a profile change, not a core change — the inventory is
+  never stale on a regulator's letter.
+- The tool serves any organization with deployed models, not one jurisdiction's banks.
+- The core stays tiny (`httpx` + `pydantic`), which is what lets downstream packages add
+  org-specific connectors, auth, and profiles without forking it.
+
+**Negative (accepted)**
+
+- `record()` takes a schema-free `payload`; envelope validation is the caller's or a
+  profile's responsibility, not the core's.
+- "Does it support regulation X?" is answered by "is there a profile?", which requires the
+  profile ecosystem to keep pace.
+
+## Alternatives considered
+
+- **Bake in SR 11‑7 / a single framework (rejected):** dates instantly and narrows the
+  audience; we watched SR 11‑7 get superseded mid-project.
+- **A rigid, regulation-shaped schema (rejected):** forces every platform's metadata into
+  one regulator's vocabulary at discovery time.
+
+See [Governance](../governance.md).
diff --git a/docs/adr/0005-storage-agnostic.md b/docs/adr/0005-storage-agnostic.md
new file mode 100644
index 0000000..09e5afb
--- /dev/null
+++ b/docs/adr/0005-storage-agnostic.md
@@ -0,0 +1,47 @@
+---
+title: "ADR 0005 — Storage-agnostic backends"
+description: Put all persistence behind one LedgerBackend protocol so the same code runs from in-memory to Snowflake.
+---
+
+# ADR 0005 — Storage-agnostic via the LedgerBackend protocol
+
+**Status:** Accepted
+
+## Context
+
+The same inventory needs to run as a throwaway in-memory object in a test, a single
+SQLite file on a laptop, git-friendly JSON in a repo, a Snowflake schema in production,
+and a thin client against a remote HTTP service. Coupling the SDK to any one of these
+would force a rewrite to change storage and make testing slow.
+
+## Decision
+
+All persistence sits behind a single `@runtime_checkable` `LedgerBackend` protocol. The
+`Ledger` SDK is written against the protocol only; the backend is a constructor argument
+(`Ledger.from_sqlite(...)`, `Ledger.from_snowflake(...)`, `Ledger(JsonFileLedgerBackend(...))`,
+`Ledger(HttpLedgerBackend(...))`). Third parties can add backends (e.g. Postgres) by
+implementing the protocol and registering an entry point — no core change.
+
+## Consequences
+
+**Positive**
+
+- Choosing storage is a one-line decision that never leaks into application code.
+- Tests run in-memory and fast; the same code path is exercised against every backend.
+- Backends are an open extension point, not a closed enum.
+
+**Negative (accepted)**
+
+- The protocol is a contract: adding a method means implementing it across every backend
+  (and any third-party one), so the surface must evolve deliberately. The HTTP backend in
+  particular can't always reconstruct server-side state locally and falls back to caches.
+- The lowest-common-denominator protocol can't expose every backend's native superpowers.
+
+## Alternatives considered
+
+- **Hard-code one backend (rejected):** forces a rewrite to change storage and makes tests
+  depend on infrastructure.
+- **An ORM abstraction (rejected):** heavier, leakier, and a poor fit for the append-only
+  event-log and the non-SQL backends (JSON files, HTTP).
+
+See [Choosing a backend](../guides/backends.md).
diff --git a/docs/adr/index.md b/docs/adr/index.md
new file mode 100644
index 0000000..5dd40d5
--- /dev/null
+++ b/docs/adr/index.md
@@ -0,0 +1,21 @@
+---
+title: Design decisions
+description: Architecture Decision Records — the load-bearing choices behind model-ledger, the alternatives weighed, and the costs accepted.
+---
+
+# Design decisions
+
+Architecture Decision Records (ADRs) capture the choices that shape model-ledger: the
+context, the decision, the alternatives considered, and the consequences — including the
+costs accepted on purpose. They are short, dated, and immutable; a reversed decision gets
+a new ADR that supersedes the old one rather than an edit.
+
+| # | Decision | Status |
+|---|---|---|
+| [0001](0001-event-log-not-a-registry.md) | Model the inventory as an event log, not a registry | Accepted |
+| [0002](0002-everything-is-a-datanode.md) | Everything is a DataNode; the graph builds itself | Accepted |
+| [0003](0003-agents-first.md) | Agents are the primary interface; the SDK is tool-shaped | Accepted |
+| [0004](0004-framework-agnostic.md) | Framework-agnostic core; regulations are pluggable profiles | Accepted |
+| [0005](0005-storage-agnostic.md) | Storage-agnostic via the LedgerBackend protocol | Accepted |
+
+The narrative that ties these together is the [Architecture](../concepts/architecture.md) page.
diff --git a/docs/concepts/architecture.md b/docs/concepts/architecture.md
new file mode 100644
index 0000000..c9f3757
--- /dev/null
+++ b/docs/concepts/architecture.md
@@ -0,0 +1,134 @@
+---
+title: Architecture
+description: How model-ledger is designed and why — the event-log thesis, the one-abstraction graph, the agent-first surface, and the trade-offs behind each.
+---
+
+# Architecture
+
+This page is the *why*. For the API, see the [Reference](../reference/index.md); for the
+record of specific decisions, the [Design decisions](../adr/index.md).
+
+model-ledger is built on four load-bearing choices. Each was made against a real
+alternative, and each carries a cost we accepted on purpose.
+
+## The shape
+
+```mermaid
+graph TB
+    subgraph consumers ["Consumers"]
+        direction LR
+        A["Agents<br/><small>MCP</small>"] ~~~ R["Frontends<br/><small>REST</small>"] ~~~ S["Scripts<br/><small>SDK</small>"] ~~~ C["CLI"]
+    end
+    subgraph protocol ["Agent protocol — consolidated tools"]
+        direction LR
+        T["discover · record · investigate · query · trace · changelog · tag"]
+    end
+    subgraph sdk ["Ledger SDK (tool-shaped)"]
+        L["register · record · add · connect · trace · history · inventory_at · composites"]
+    end
+    subgraph sources ["Discovery"]
+        direction LR
+        CO["SourceConnector protocol<br/><small>sql · rest · github · yours</small>"]
+    end
+    subgraph storage ["Storage"]
+        direction LR
+        B["LedgerBackend protocol<br/><small>memory · sqlite · json · snowflake · http</small>"]
+    end
+    consumers --> protocol --> sdk
+    sdk --> sources
+    sdk --> storage
+    classDef ink fill:#1c1a17,color:#f7f3ec,stroke:#000;
+    classDef ox fill:#efe8da,stroke:#7a1a1a,color:#1c1a17;
+    class protocol ink;
+```
+
+The consumers are interchangeable because they all bottom out in the same tool-shaped
+SDK. Discovery and storage are both *protocols*, so the core stays tiny and the ecosystem
+extends it without forking.
+
+## 1. The inventory is an event log, not a registry
+
+A registry stores *current state* and overwrites it. model-ledger stores *what happened*
+and never overwrites anything: a model is an identity ([`ModelRef`](snapshot.md)), and
+every change is an immutable, content-addressed [`Snapshot`](snapshot.md).
+
+**Why.** The question a governance regime actually asks is *"show me the complete history
+of every change, approval, and validation"* — and *"what was true on this past date?"* A
+mutable registry structurally cannot answer the second question; an append-only log
+answers both for free, and content-addressing makes the chain tamper-evident.
+
+**The cost we accepted.** More storage, and reconstruction (`inventory_at`) is a replay
+rather than a row read. We trade write-time simplicity for an audit trail that can't be
+quietly edited — the right trade for a system of record. → [ADR 0001](../adr/0001-event-log-not-a-registry.md)
+
+## 2. Everything is a DataNode
+
+An ML model, a heuristic rule, an ETL job, and an alert queue are the same shape: each
+consumes some inputs and produces some outputs. So they're one type —
+[`DataNode`](datanode.md) with typed ports — and the dependency graph assembles itself
+when an output port name matches an input port name.
+
+**Why.** Discovery scales only if connectors stay dumb. A connector emits nodes with
+their ports and knows nothing about the rest of the graph; the cross-platform edges
+(an ETL job in your warehouse → a model in MLflow → a queue in your alerting system)
+fall out of port matching, with no shared ID scheme to maintain.
+
+**The cost we accepted.** Two models can legitimately write a table with the same name.
+Bare names would over-link, so `DataPort` carries optional schema discriminators to keep
+edges precise. We rejected per-platform model *types* and a fixed metadata schema — both
+too rigid to span platforms. → [ADR 0002](../adr/0002-everything-is-a-datanode.md)
+
+## 3. Agents are the primary interface
+
+The SDK is *tool-shaped*: each method maps to one consolidated agent tool, exposed
+identically over [MCP](../guides/agents.md) and [REST](../guides/backends.md). The verb
+set is deliberately small (`discover`, `record`, `investigate`, `query`, `trace`,
+`changelog`, `tag`) rather than a sprawl of endpoints.
+
+**Why.** The most natural way to ask *"which high-risk models changed this week and
+haven't been validated?"* is to ask. Designing for the agent first (per
+[Anthropic's tool-writing guidance](https://www.anthropic.com/engineering/writing-tools-for-agents))
+makes the SDK and REST surfaces cleaner as a side effect — consolidated, orthogonal, hard
+to misuse.
+
+**The cost we accepted.** Fewer, broader tools mean a single call does more, which is a
+worse fit for fine-grained REST conventions. We optimize for the agent's working memory
+over endpoint granularity. → [ADR 0003](../adr/0003-agents-first.md)
+
+## 4. Framework-agnostic core, pluggable everything
+
+Storage, discovery, introspection, and compliance are all `@runtime_checkable` Protocols
+discovered via entry points. Regulations live in **profiles** — a plugin layer — not in
+the core. The core depends only on `httpx` + `pydantic`.
+
+**Why.** model-ledger is an inventory for *any* organization with deployed models, not a
+single-regulation tool. Keeping regulations as a thin, swappable layer means a renumbered
+rule (SR 11‑7 → SR 26‑2) is a profile change, not a core change — see
+[Governance](../governance.md). The tiny core is also what lets a downstream package add
+org-specific connectors and auth without touching it. → [ADR 0004](../adr/0004-framework-agnostic.md) · [ADR 0005](../adr/0005-storage-agnostic.md)
+
+**The cost we accepted.** `record()` takes a schema-free `payload`; envelope validation is
+the caller's (or a profile's) responsibility. We trade a rigid schema for the freedom to
+record whatever a platform actually has.
+
+## What model-ledger is *not*
+
+Stating the boundary is part of the design:
+
+- **Not a feature store or a serving layer.** It inventories and relates models; it does
+  not store features or serve predictions.
+- **Not a monitoring/metrics system.** It records *that* a validation or retrain happened
+  (as an event); it doesn't compute drift or accuracy.
+- **Discovery is point-in-time, not streaming.** Connectors run on a schedule and snapshot
+  what they find; `last_seen` lets you detect models that have gone silent, but the graph
+  is as fresh as the last sync.
+- **Connectors that need live credentials run from the SDK, not the agent.** `rest` and
+  `prefect` are pure-config and run through the `discover` tool; `sql`/`github` need a live
+  connection or a callable and are driven from the [SDK](../guides/connectors.md). The
+  agent gets an actionable error, never a crash.
+
+## Where to go next
+
+- The primitives, in three ideas → [Concepts](index.md)
+- The guarantees the event log provides → [Snapshots & the event log](snapshot.md)
+- The record of each decision and its alternatives → [Design decisions](../adr/index.md)
diff --git a/mkdocs.yml b/mkdocs.yml
index a6593c0..d28a15e 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -145,6 +145,7 @@ nav:
       - Installation: installation.md
   - Concepts:
       - concepts/index.md
+      - Architecture: concepts/architecture.md
       - DataNode & the graph: concepts/datanode.md
       - Snapshots & the event log: concepts/snapshot.md
       - Composites: concepts/composite.md
@@ -162,3 +163,10 @@ nav:
   - Reference:
       - reference/index.md
       - Glossary: glossary.md
+      - Design decisions:
+          - adr/index.md
+          - "ADR 0001 — Event log, not a registry": adr/0001-event-log-not-a-registry.md
+          - "ADR 0002 — Everything is a DataNode": adr/0002-everything-is-a-datanode.md
+          - "ADR 0003 — Agents are the primary interface": adr/0003-agents-first.md
+          - "ADR 0004 — Framework-agnostic, pluggable profiles": adr/0004-framework-agnostic.md
+          - "ADR 0005 — Storage-agnostic backends": adr/0005-storage-agnostic.md