What
enqueue_sync (crates/gitlawb-node/src/db/mod.rs:1159) inserts into sync_queue with a fresh Uuid for id and a bare ON CONFLICT DO NOTHING:
INSERT INTO sync_queue (id, repo, node_did, ref_name, new_sha, cid, status, enqueued_at)
VALUES ($1, $2, $3, $4, $5, $6, 'pending', $7)
ON CONFLICT DO NOTHING
sync_queue (db/mod.rs:503) has exactly one unique constraint — id TEXT PRIMARY KEY — and id is a fresh Uuid::new_v4() on every call. A bare ON CONFLICT DO NOTHING only fires on a unique violation, so the clause can never fire: it does nothing. There is no UNIQUE(repo, node_did, ref_name, new_sha), so identical pending rows are never collapsed.
Consequence
The 30s worker process_batch (crates/gitlawb-node/src/sync.rs:148) dequeues up to 10 pending rows and processes each independently, running a full refspec-less git fetch --prune origin (sync.rs:528) per row — no per-repo coalescing.
So duplicate pending rows for the same repo each trigger a full-repo fetch, even though the first fetch already pulls every ref.
This was latent before but is amplified by the per-ref sync-notify fan-out (#72): a push of N refs to P peers now sends N notifies per peer, each calling enqueue_sync, so a receiver gets up to N pending rows per push and runs N full fetches where one would do. With the 10-rows-per-30s batch, a large multi-ref push can also starve the queue and delay other repos' syncs. Owner-gated (only the repo owner can push), so the amplification ceiling is bounded, but a crafted high-ref-count push is the worst case.
Not a correctness bug — sync stays eventually consistent — but real redundant network/CPU and queue pressure.
Fix options
- Coalesce in the worker: dequeue distinct repos (or dedupe pending rows by
(repo, node_did)) and fetch once per repo per batch, regardless of how many per-ref rows are pending. Cleanest, since even distinct refs of one repo only need one whole-repo fetch.
- Make the dedup real: add
UNIQUE(repo, node_did, ref_name, new_sha) to sync_queue and change the insert to ON CONFLICT (repo, node_did, ref_name, new_sha) DO NOTHING. Collapses identical pending rows but still fetches once per distinct ref.
Option 1 is preferred — it bounds work to one fetch per repo regardless of ref count.
Evidence
crates/gitlawb-node/src/db/mod.rs:1159 — inert ON CONFLICT DO NOTHING on fresh-UUID insert
crates/gitlawb-node/src/db/mod.rs:503 — sync_queue schema, only id is unique
crates/gitlawb-node/src/sync.rs:148 — per-row processing, no coalescing
crates/gitlawb-node/src/sync.rs:528 — refspec-less full fetch per row
What
enqueue_sync(crates/gitlawb-node/src/db/mod.rs:1159) inserts intosync_queuewith a freshUuidforidand a bareON CONFLICT DO NOTHING:sync_queue(db/mod.rs:503) has exactly one unique constraint —id TEXT PRIMARY KEY— andidis a freshUuid::new_v4()on every call. A bareON CONFLICT DO NOTHINGonly fires on a unique violation, so the clause can never fire: it does nothing. There is noUNIQUE(repo, node_did, ref_name, new_sha), so identical pending rows are never collapsed.Consequence
The 30s worker
process_batch(crates/gitlawb-node/src/sync.rs:148) dequeues up to 10 pending rows and processes each independently, running a full refspec-lessgit fetch --prune origin(sync.rs:528) per row — no per-repo coalescing.So duplicate pending rows for the same repo each trigger a full-repo fetch, even though the first fetch already pulls every ref.
This was latent before but is amplified by the per-ref sync-notify fan-out (#72): a push of N refs to P peers now sends N notifies per peer, each calling
enqueue_sync, so a receiver gets up to N pending rows per push and runs N full fetches where one would do. With the 10-rows-per-30s batch, a large multi-ref push can also starve the queue and delay other repos' syncs. Owner-gated (only the repo owner can push), so the amplification ceiling is bounded, but a crafted high-ref-count push is the worst case.Not a correctness bug — sync stays eventually consistent — but real redundant network/CPU and queue pressure.
Fix options
(repo, node_did)) and fetch once per repo per batch, regardless of how many per-ref rows are pending. Cleanest, since even distinct refs of one repo only need one whole-repo fetch.UNIQUE(repo, node_did, ref_name, new_sha)tosync_queueand change the insert toON CONFLICT (repo, node_did, ref_name, new_sha) DO NOTHING. Collapses identical pending rows but still fetches once per distinct ref.Option 1 is preferred — it bounds work to one fetch per repo regardless of ref count.
Evidence
crates/gitlawb-node/src/db/mod.rs:1159— inertON CONFLICT DO NOTHINGon fresh-UUID insertcrates/gitlawb-node/src/db/mod.rs:503—sync_queueschema, onlyidis uniquecrates/gitlawb-node/src/sync.rs:148— per-row processing, no coalescingcrates/gitlawb-node/src/sync.rs:528— refspec-less full fetch per row