Skip to content

fix(cdc): emit fallback notification for oversized payloads + visible errors#1410

Merged
jakebromberg merged 2 commits into
mainfrom
fix/1120-cdc-oversized-notify
Jun 14, 2026
Merged

fix(cdc): emit fallback notification for oversized payloads + visible errors#1410
jakebromberg merged 2 commits into
mainfrom
fix/1120-cdc-oversized-notify

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Closes #1120

Summary

Migration 0094 replaces the cdc_notify() trigger function so it stops silently dropping CDC events when a row's JSON payload exceeds Postgres's 8000-byte pg_notify cap, and stops swallowing other trigger exceptions into the PG log only.

  • Detects the oversized case up-front (octet_length(payload::text) > 7800) and emits a minimal cdc_oversized fallback notification carrying (table, schema, action, primary_key, payload_bytes, reason='payload_too_large') so downstream consumers can refetch the row from the source of truth instead of going dark.
  • Routes any unexpected trigger exception through a dedicated cdc_error channel (reason='trigger_exception') in addition to the pre-existing RAISE WARNING, restoring listener-side visibility into trigger failures.
  • Returns NEW/OLD instead of bare NULL so the trigger contract is unambiguous if a future maintainer rewires it as BEFORE.
  • Documents the two new notification channels in the migration's header comment.

Tests in tests/integration/cdc-oversized-fallback.spec.js exercise both paths against real PG: normal INSERTs still fire cdc, oversized INSERTs fire cdc_oversized (and NOT cdc), and the row commits regardless (visibility-only failure mode). A function-body pin guards against an accidental rollback of 0094.

Test plan

  • new oversized-payload integration test added (cdc / cdc_oversized split + commit-regardless + function-body pin) — runs against PG once the integration DB is up
  • full unit suite green (npm run test:unit — 3151/3151)
  • lint clean (npm run lint — 0 errors)
  • format clean (npm run format:check)
  • typecheck clean (npm run typecheck)
  • migration validator clean (npm run lint:migrations)
  • precondition-guard check clean (scripts/check-precondition-guards.sh)
  • bulk-update/ANALYZE check clean (scripts/check-bulk-update-analyze.mjs)
  • integration tests not run locally — Docker daemon unavailable in this worktree; will be exercised by the CI integration-tests job (touches shared/database/src/migrations/** + tests/**)

@github-actions

Copy link
Copy Markdown

Schema constraint shape report

data-shape report errored (exit 0): node:internal/modules/runmain:107 triggerUncaughtException( ^ Error ERRMODULENOTFOUND: Cannot find package 'postgres' imported from /home/runner/work/Backend-Service/Backend-Service/scripts/schema-shape-report.mjs Did you mean to import "postgres/cjs/src/index.js"? at Object.getPackageJ; manual check required

@jakebromberg

jakebromberg commented Jun 14, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. New cdc_oversized / cdc_error channels have no production consumer — the migration emits to channels nothing subscribes to.

    The migration's own header comment (0094_cdc_oversized_fallback.sql L33-L34) says "New channel — see cdc-listener.ts subscription update", but cdc-listener.ts is unchanged in this PR and still only subscribes to the cdc channel:

    await listenConnection.listen(
    CDC_CHANNEL,
    (payload: string) => {
    try {
    const event = JSON.parse(payload) as CdcEvent;
    for (const cb of callbacks) {
    try {
    cb(event);
    } catch (err) {
    console.error('[cdc-listener] Callback error:', err);
    }
    }
    } catch (err) {
    console.error('[cdc-listener] Failed to parse CDC payload:', err);
    }
    },
    () => {
    // Fires on initial subscribe AND on every postgres-js auto-reconnect's
    // re-LISTEN. Either way the LISTEN is live again, so flip back to true.
    dispatchState(true);
    }
    );

    pg_notify is fire-and-forget (docs/cdc.md: "Postgres does not durably queue notifications for absent listeners"). With no consumer, the cdc_oversized fallback NOTIFY is dropped on the floor — the downstream cdc-websocket, metadata-broadcast, and reconciliation monitor all consume through onCdcEvent(), which is wired exclusively to cdc. The only retained signal for the oversized branch is the RAISE WARNING in the PG log, which the issue (#1120) describes as the inadequate status quo this PR exists to fix.

    This also leaves issue CDC trigger silently drops events for rows exceeding 8000 bytes #1120 AC Auth middleware #3 ("emit a metric Sentry can alert on") functionally unmet: routing to another pg_notify channel that no Node-side path subscribes to produces no Sentry signal.

    Recommend either extending cdc-listener.ts to subscribe to both new channels (with a separate onCdcOversizedEvent / onCdcErrorEvent dispatch and a Sentry hook), or shipping that listener change as a fast-follow PR before this trigger change reaches prod.

… errors (#1120)

Postgres `pg_notify` enforces an 8000-byte payload cap. The previous `cdc_notify()` trigger function wrapped the call in a broad `EXCEPTION WHEN OTHERS ... RAISE WARNING ... RETURN NULL`, so when a row's `to_jsonb(NEW)` overflowed the cap (e.g. an enrichment-worker `flowsheet` UPDATE with `artist_bio` + seven streaming URLs) the notification was silently dropped — the originating mutation committed, but no consumer ever saw the event. The only signal was a buried PG-log WARNING line.

Migration 0094 replaces the trigger function so it:

- Detects the oversized case up-front (`octet_length(payload_text) > 7800`) and emits a minimal `cdc_oversized` fallback notification carrying `(table, schema, action, primary_key, payload_bytes, reason='payload_too_large')` so downstream consumers can refetch the row from the source of truth.
- Routes any other unexpected exception through a dedicated `cdc_error` channel (`reason='trigger_exception'`) in addition to the existing `RAISE WARNING`, restoring listener visibility into trigger failures.
- Returns `NEW`/`OLD` instead of bare `NULL` so the trigger contract is unambiguous if a future maintainer rewires it as `BEFORE`.

Tests in `tests/integration/cdc-oversized-fallback.spec.js` verify both paths against real PG: normal INSERTs still fire `cdc`, oversized INSERTs fire `cdc_oversized` (and NOT `cdc`), and the row commits regardless. Also pins the function body so an accidental rollback of 0094 fails CI before redeploy.

The migration header comment documents the new notification channels for listener-side consumers.
…for #1120 AC #3

Migration 0094 emits to two new pg_notify channels (cdc_oversized,
cdc_error) when the primary cdc payload would have been silently
dropped. Without consumers, those notifications go to /dev/null and AC

Wire the consumer side end-to-end:

- shared/database/src/cdc-listener.ts: define CdcOversizedEvent and
  CdcErrorEvent shapes matching the SQL payloads; add onCdcOversizedEvent
  and onCdcErrorEvent registrations; LISTEN on both new channels
  alongside the existing cdc subscription in startCdcListener; clear
  the new callback arrays in stopCdcListener.

- apps/backend/services/cdc/dispatcher.ts: wire the dispatcher's fallback
  sinks to Sentry.captureMessage with stable fingerprints
  ('cdc-oversized-payload' / 'cdc-trigger-exception'). Module-level latch
  keeps the registration idempotent across stray startCdcDispatcher
  calls; shutdown drops the latch.

- apps/enrichment-worker/worker.ts: mirror the Sentry wiring in the
  worker process so its independent LISTEN connection also surfaces the
  fallback channels (consumer='enrichment-worker' tag for disambiguation).

Tests:
- BS#1120 describe block in cdc-listener.test.ts pins channel
  subscription, callback dispatch, multi-callback fan-out, callback-error
  isolation, malformed-payload tolerance, and stopCdcListener teardown
  of the new callback arrays.
- BS#1120 describe block in cdc-websocket.test.ts (alongside the
  existing dispatcher tests) pins the dispatcher's Sentry wiring,
  captureMessage tag/extra/fingerprint shape, double-start idempotency,
  and shutdown-resets-latch behavior.
- Updated the BS#1014 enableLivenessProbe channel-order assertion to
  cover the new cdc_oversized + cdc_error subscriptions.
@jakebromberg jakebromberg force-pushed the fix/1120-cdc-oversized-notify branch from 93a824e to d4f8ca8 Compare June 14, 2026 20:51
@jakebromberg jakebromberg merged commit 65643f2 into main Jun 14, 2026
6 checks passed
@jakebromberg jakebromberg deleted the fix/1120-cdc-oversized-notify branch June 14, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CDC trigger silently drops events for rows exceeding 8000 bytes

1 participant