perf(snowflake): composite_summary as one self-contained SQL statement#21
Conversation
The Snowflake backend previously delegated composite_summary() to a V_COMPOSITES view it never creates — an undocumented deployment dependency — and that view computed derived fields with per-row correlated subqueries. Deployments without the view fell back to the SDK's sequential path: list_models() over the full inventory plus per-composite snapshot replays (measured: 92.6s and 568 round trips for a 41-composite production ledger). composite_summary() now computes everything in ONE CTE statement over the backend's own MODELS and SNAPSHOTS tables, replicating the SDK fallback semantics exactly: - composites: MODEL_TYPE IN (...) pushed down to SQL - member_count: replay of Ledger.members() — baseline from depends_on snapshots with payload.relationship='member_of' resolved against MODELS, overlaid by member_added/member_removed in timestamp order, latest op wins per (composite, member); unresolvable adds are no-ops - last_validated: MAX(timestamp) of 'validated' snapshots - open_observation_count: distinct observation_id issued and never resolved (set semantics, ignoring events without an observation_id) Parity proven against a production ledger (28.8k models / 212k snapshots): row-for-row identical with both the live sequential fallback (41/41 rows) and an in-memory replay oracle over bulk-loaded data (2334/2334 rows). Warm runtime 0.3-0.6s. InMemoryLedgerBackend intentionally keeps no composite_summary — the SDK fallback remains the reference implementation and the dispatch seam is covered by a new SDK test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8fee072d93
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| HAVING COUNT_IF(EVENT_TYPE = 'observation_issued') > 0 | ||
| AND COUNT_IF(EVENT_TYPE = 'observation_resolved') = 0 |
There was a problem hiding this comment.
Count unresolved observations despite COUNT_IF NULLs
For an observation ID that has only observation_issued rows, Snowflake documents COUNT_IF(EVENT_TYPE = 'observation_resolved') as returning NULL when no rows satisfy the condition, so the = 0 predicate is not true. That drops every genuinely open observation from open_obs, causing open_observation_count to come back as 0 for the common case of issued-but-not-resolved observations; coalesce the count or use a SUM/IFF expression before comparing to zero.
Useful? React with 👍 / 👎.
What
SnowflakeLedgerBackend.composite_summary()now computes the full composite inventory in one self-contained SQL statement (CTE-based event replay over the backend's ownMODELS+SNAPSHOTStables).Previously the method delegated to a
V_COMPOSITESview that the backend never creates — an undocumented deployment dependency. Deployments without that view (i.e., any stock install) hit the SDK's sequential fallback:list_models()over the full inventory plus 2+ snapshot queries and per-edge model lookups for every composite.Semantics — exact replication of the SDK fallback
The statement replicates
Ledger.composite_summary()'s fallback behavior field by field:MODELSrows withMODEL_TYPE IN (...)— pushed down to SQL, never filtered client-sideLedger.members()— baseline =depends_onsnapshots withpayload.relationship = 'member_of', resolved againstMODELSviapayload.upstream_hash(name match first, then hash, mirroringLedger.get); overlaid bymember_added/member_removedevents in timestamp order, latest op wins per (composite, member).member_addedevents whose member resolves to no registered model are no-ops in the replay and are dropped in SQL too.MAX(timestamp)ofvalidatedsnapshots on the compositeobservation_idvalues issued and never resolved — set semantics matchingLedger.open_observation_count(events without anobservation_idpayload key are ignored; an id issued then resolved is closed; a resolution without a matching issue is ignored)name, owner, tier, status, model_type, member_count, last_validated (datetime | None), open_observation_count, metadataInMemoryLedgerBackendintentionally keeps nocomposite_summary— the SDK fallback stays the reference implementation, and thehasattrdispatch seam is now covered by an SDK test (test_dispatches_to_backend_composite_summary_when_available).Parity proof (production-scale ledger: 28,848 models / 211,989 snapshots)
Run read-only against a live production deployment:
composite_summary) and diffed every field of every row against the new statement's output → 0 mismatching rows (41/41).InMemoryLedgerBackendand ran the actual SDK fallback code locally, then diffed against the new SQL → 0 mismatching rows (2,334/2,334).Performance (same production ledger)
Behavior note for downstream wrappers
For deployments that previously relied on the view's extra filtering: the view applied a
metadata.source IS NOT NULLguard that the SDK fallback never had. This method now returns exactly what the SDK contract promises — every model whosemodel_typematches. Wrappers that want a narrower inventory should filter onmetadatain their own layer (it's in every returned row).Follow-up (diagnosed, not in this PR):
investigatetool round tripstools.investigatemeasured 10.0s / 39 round trips for a composite and 32.7s / 109 round trips for a model belonging to 3 groups. Breakdown: the dominant cost is per-edgeLedger.get()resolution insidedependencies()→members()/groups()(each edge costs aget_model_by_name+get_modelpair), pluslist_snapshotsre-fetched for the same model up to 4 times (history(),dependencies(),members(),groups()each refetch). Fix plan:IN (...)lookup (thebatch_dependenciespattern that already exists) instead of per-edgeget()history()intomembers()/groups()Estimated ~39 → ~6 round trips without semantic change. Left out of this PR because it touches the
Ledgercall graph rather than adding a backend method.🤖 Generated with Claude Code