fix(enrichment-worker): drain in-flight candidates on SIGTERM by jakebromberg · Pull Request #1408 · WXYC/Backend-Service

jakebromberg · 2026-06-14T04:35:01Z

Summary

The CDC dispatcher invokes handleCandidate fire-and-forget (void handleCandidate(...)), so SIGTERM tore down the PG pool while LML lookups were still pending — the subsequent claim/finalize write would throw on a torn-down connection, leaving the row stranded in metadata_status='enriching' until C6 swept it.
Adds an inFlightCandidates registry in handler.ts (mirrors the backend's inFlightEnrichments in apps/backend/services/metadata/enrichment.service.ts): each tick auto-registers and unregisters via .finally so the registry never slow-leaks past a settle.
Adds drainInFlightCandidates(deadlineMs) that races Promise.allSettled(snapshot) against setTimeout(deadlineMs) and returns the current registry size — never throws, returns 0 immediately on empty.
Wires the drain into worker.ts shutdown between stopCdcListener and closeDatabaseConnection. Bounded by WORKER_DRAIN_DEADLINE_MS (default 30s, env-overridable via ENRICHMENT_WORKER_DRAIN_DEADLINE_MS) so a hung LML call can't block deploy indefinitely.
Sentry captureMessage on non-zero remaining count uses the same metric: 'in_flight_dropped' tag shape as the backend's BS#905 path so a single alert can fan over both surfaces.

Reduces but does not eliminate the BS#895 sweep traffic referenced in the issue body.

Test plan

new in-flight registry tests in tests/unit/apps/enrichment-worker/drain.test.ts pin: empty-registry fast-return, register-on-dispatch, unregister-on-resolve, unregister-on-reject, skip-on-filter-reject, drain-awaits-pending, drain-bounds-deadline, drain-survives-rejection
all 84 enrichment-worker unit tests pass; full unit suite 3160/3160 green
npm run lint && npm run format:check && npm run typecheck clean

The CDC dispatcher invokes `handleCandidate` as `void handleCandidate(...)` — fire-and-forget — so on SIGTERM the worker's shutdown path tore down the PG pool while LML lookups were still pending. The subsequent claim or finalize write would throw on a torn-down connection, leaving the row in `metadata_status='enriching'` until C6 (#895) swept it past `enriching_since + 60s` and triggered a second Discogs lookup whose answer was already retrieved and discarded. Mirrors the backend's `drainInFlightEnrichments` shape (apps/backend/services/metadata/enrichment.service.ts): - `inFlightCandidates` Set in handler.ts; the CDC dispatcher registers each invocation and unregisters via .finally on settle (resolve OR reject) so the registry can never slow-leak past a tick. - `drainInFlightCandidates(deadlineMs)` races `Promise.allSettled(snapshot)` against `setTimeout(deadlineMs)` and returns the current registry size — never throws. - worker.ts shutdown calls stopCdcListener → drain → closeDatabaseConnection so pending lookups can finalize their writes before the pool closes. Bounded by WORKER_DRAIN_DEADLINE_MS (default 30s, env-overridable) so a hung LML call can't block deploy indefinitely. - Sentry captureMessage on non-zero remaining count uses the same `metric: 'in_flight_dropped'` tag shape as the backend so a single alert can fan over both surfaces. Reduces but does not eliminate the BS#895 sweep traffic referenced in the issue body.

…e, unhandled rejection guard, env clamp - Drop default WORKER_DRAIN_DEADLINE_MS from 30s to 2s and clamp env overrides at the per-step shutdown bound (5s). The deploy action runs `docker stop` without `-t`, so the SIGTERM-to-SIGKILL grace is the Docker default of 10s; with `stop_cdc` and `drain_sweep` already consuming part of that window, anything above ~5s is dead budget. Mirrors apps/backend's `ENRICHMENT_DRAIN_DEADLINE_MS=2_000` rationale (BS#905). - Chain `.catch(() => {})` before `.finally` on the dispatcher's in-flight promise. handleCandidate wraps its body in try/catch but the outer `await Sentry.startSpan(...)` is unguarded; a Sentry-internal throw (exporter error, disposed hub during shutdown) would otherwise surface as an `unhandledRejection`. Mirrors backend's pattern in `apps/backend/services/metadata/enrichment.service.ts`. - Add `jest.resetAllMocks()` to drain.test.ts beforeEach so a future test using sticky mocks can't silently inherit a previous case's stub. - New regression test pins that a Sentry.startSpan throw is swallowed and the registry still drains.

jakebromberg added 2 commits June 13, 2026 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(enrichment-worker): drain in-flight candidates on SIGTERM#1408

fix(enrichment-worker): drain in-flight candidates on SIGTERM#1408
jakebromberg wants to merge 2 commits into
mainfrom
fix/1108-enrichment-drain

jakebromberg commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jakebromberg commented Jun 14, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant