Skip to content

perf(snowflake): composite_summary as one self-contained SQL statement#21

Merged
vigneshnarayanaswamy merged 1 commit into
mainfrom
vigneshn/fast-composite-summary
Jun 10, 2026
Merged

perf(snowflake): composite_summary as one self-contained SQL statement#21
vigneshnarayanaswamy merged 1 commit into
mainfrom
vigneshn/fast-composite-summary

Conversation

@vigneshnarayanaswamy

Copy link
Copy Markdown
Collaborator

What

SnowflakeLedgerBackend.composite_summary() now computes the full composite inventory in one self-contained SQL statement (CTE-based event replay over the backend's own MODELS + SNAPSHOTS tables).

Previously the method delegated to a V_COMPOSITES view that the backend never creates — an undocumented deployment dependency. Deployments without that view (i.e., any stock install) hit the SDK's sequential fallback: list_models() over the full inventory plus 2+ snapshot queries and per-edge model lookups for every composite.

Semantics — exact replication of the SDK fallback

The statement replicates Ledger.composite_summary()'s fallback behavior field by field:

  • composites: MODELS rows with MODEL_TYPE IN (...) — pushed down to SQL, never filtered client-side
  • member_count: replay of Ledger.members() — baseline = depends_on snapshots with payload.relationship = 'member_of', resolved against MODELS via payload.upstream_hash (name match first, then hash, mirroring Ledger.get); overlaid by member_added / member_removed events in timestamp order, latest op wins per (composite, member). member_added events whose member resolves to no registered model are no-ops in the replay and are dropped in SQL too.
  • last_validated: MAX(timestamp) of validated snapshots on the composite
  • open_observation_count: distinct observation_id values issued and never resolved — set semantics matching Ledger.open_observation_count (events without an observation_id payload key are ignored; an id issued then resolved is closed; a resolution without a matching issue is ignored)
  • returned dict keys/types identical to the fallback: name, owner, tier, status, model_type, member_count, last_validated (datetime | None), open_observation_count, metadata

InMemoryLedgerBackend intentionally keeps no composite_summary — the SDK fallback stays the reference implementation, and the hasattr dispatch seam is now covered by an SDK test (test_dispatches_to_backend_composite_summary_when_available).

Parity proof (production-scale ledger: 28,848 models / 211,989 snapshots)

Run read-only against a live production deployment:

  1. Old sequential fallback vs new SQL, 41 composites: forced the genuine fallback path (backend proxy hiding composite_summary) and diffed every field of every row against the new statement's output → 0 mismatching rows (41/41).
  2. In-memory replay oracle vs new SQL, full type filter (2,334 rows): bulk-loaded all models + relevant snapshots into InMemoryLedgerBackend and ran the actual SDK fallback code locally, then diffed against the new SQL → 0 mismatching rows (2,334/2,334).

Performance (same production ledger)

path time round trips
old SDK fallback (41 composites) 92.6s 568
old view-based query (correlated subqueries per row) 2.5–23.5s depending on warehouse/cache 1
new single statement (2,334 rows) 0.3–0.6s warm; ~6s first-ever compile on a cold plan 1

Behavior note for downstream wrappers

For deployments that previously relied on the view's extra filtering: the view applied a metadata.source IS NOT NULL guard that the SDK fallback never had. This method now returns exactly what the SDK contract promises — every model whose model_type matches. Wrappers that want a narrower inventory should filter on metadata in their own layer (it's in every returned row).

Follow-up (diagnosed, not in this PR): investigate tool round trips

tools.investigate measured 10.0s / 39 round trips for a composite and 32.7s / 109 round trips for a model belonging to 3 groups. Breakdown: the dominant cost is per-edge Ledger.get() resolution inside dependencies()members() / groups() (each edge costs a get_model_by_name + get_model pair), plus list_snapshots re-fetched for the same model up to 4 times (history(), dependencies(), members(), groups() each refetch). Fix plan:

  1. resolve dependency edges with one batched IN (...) lookup (the batch_dependencies pattern that already exists) instead of per-edge get()
  2. thread the already-fetched snapshot list from history() into members() / groups()

Estimated ~39 → ~6 round trips without semantic change. Left out of this PR because it touches the Ledger call graph rather than adding a backend method.

🤖 Generated with Claude Code

The Snowflake backend previously delegated composite_summary() to a
V_COMPOSITES view it never creates — an undocumented deployment dependency —
and that view computed derived fields with per-row correlated subqueries.
Deployments without the view fell back to the SDK's sequential path:
list_models() over the full inventory plus per-composite snapshot replays
(measured: 92.6s and 568 round trips for a 41-composite production ledger).

composite_summary() now computes everything in ONE CTE statement over the
backend's own MODELS and SNAPSHOTS tables, replicating the SDK fallback
semantics exactly:

- composites: MODEL_TYPE IN (...) pushed down to SQL
- member_count: replay of Ledger.members() — baseline from depends_on
  snapshots with payload.relationship='member_of' resolved against MODELS,
  overlaid by member_added/member_removed in timestamp order, latest op
  wins per (composite, member); unresolvable adds are no-ops
- last_validated: MAX(timestamp) of 'validated' snapshots
- open_observation_count: distinct observation_id issued and never
  resolved (set semantics, ignoring events without an observation_id)

Parity proven against a production ledger (28.8k models / 212k snapshots):
row-for-row identical with both the live sequential fallback (41/41 rows)
and an in-memory replay oracle over bulk-loaded data (2334/2334 rows).
Warm runtime 0.3-0.6s.

InMemoryLedgerBackend intentionally keeps no composite_summary — the SDK
fallback remains the reference implementation and the dispatch seam is
covered by a new SDK test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8fee072d93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +594 to +595
HAVING COUNT_IF(EVENT_TYPE = 'observation_issued') > 0
AND COUNT_IF(EVENT_TYPE = 'observation_resolved') = 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Count unresolved observations despite COUNT_IF NULLs

For an observation ID that has only observation_issued rows, Snowflake documents COUNT_IF(EVENT_TYPE = 'observation_resolved') as returning NULL when no rows satisfy the condition, so the = 0 predicate is not true. That drops every genuinely open observation from open_obs, causing open_observation_count to come back as 0 for the common case of issued-but-not-resolved observations; coalesce the count or use a SUM/IFF expression before comparing to zero.

Useful? React with 👍 / 👎.

@vigneshnarayanaswamy vigneshnarayanaswamy merged commit dbf26d5 into main Jun 10, 2026
8 checks passed
@vigneshnarayanaswamy vigneshnarayanaswamy deleted the vigneshn/fast-composite-summary branch June 10, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant