Skip to content

fix(snowflake): dedup model buffer by hash before flush#11

Merged
vigneshnarayanaswamy merged 1 commit into
mainfrom
vigneshn/flush-dedup-model-buffer
Jun 1, 2026
Merged

fix(snowflake): dedup model buffer by hash before flush#11
vigneshnarayanaswamy merged 1 commit into
mainfrom
vigneshn/flush-dedup-model-buffer

Conversation

@vigneshnarayanaswamy

@vigneshnarayanaswamy vigneshnarayanaswamy commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Ledger.add() buffers the same new model twice in one pass — register() saves it, then update_model() (which is just save_model again) saves it a second time. The flush MERGE is idempotent only once the target row exists; for a brand-new model the empty-target WHEN NOT MATCHED → INSERT fires per source row, so an undeduped buffer writes duplicate rows.

Fix: dedup _model_buffer by model_hash (last write wins) in _flush_models() before the MERGE. Existing models were never affected (register() early-returns from cache, so only update_model buffers them and the MERGE updates in place).

Linear

  • Surfaced by MRM-324 (Risk Arbiter compliance-name connector) — its first bulk insert of ~500 brand-new nodes produced exactly 2× rows.
  • Also gates MRM-302 (ORM connector) — its initial bulk insert would otherwise double every new node.

Consumer PR

  • squareup/forge-block-mrm#134 bumps the pinned model-ledger SHA to include this fix.

Test Plan

  • New test test_flush_dedups_model_buffer_by_hash — saves the same model twice, asserts it reaches the MERGE source exactly once (fails before fix: 2×)
  • Full suite: 708 passed, 4 skipped
  • ruff + mypy clean

🤖 Generated with Claude Code

A single Ledger.add() pass buffers the same new model twice (register() saves
it, then update_model() saves it again). The flush MERGE is idempotent only
once the target row exists; for a brand-new model the empty-target INSERT
fires per source row, so an undeduped buffer writes duplicate rows. Dedup the
buffer by model_hash (last write wins) in _flush_models before the MERGE.

Surfaced by a first-time bulk insert of ~500 new nodes, which produced exactly
2x rows. Existing nodes were unaffected (register() early-returns from cache,
so only update_model() buffers them, and MERGE updates in place).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vigneshnarayanaswamy vigneshnarayanaswamy marked this pull request as ready for review June 1, 2026 22:28
@vigneshnarayanaswamy vigneshnarayanaswamy merged commit eabf0fc into main Jun 1, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant