Skip to content

feat(schema): add rotation.lml_identity_id; mint LML identities on dj-site paste #1380

@jakebromberg

Description

@jakebromberg

Problem

rotation.discogs_release_id (added in BS#1029) holds raw Discogs release IDs as the only LML-handle BS records for rotation rows. Three issues with that shape, surfaced during the code-review loop on PR #1376:

  1. Discogs IDs aren't stable. Discogs admins reorganize the catalog periodically — the 2026-04-28 reshuffle is documented prior art. A column we populate from a paste URL today can point at a different release (or 404) months later.
  2. Layer violation. BS shouldn't know about Discogs's ID space. The reason BS records any ID is to give LML a stable handle for follow-up calls (avoiding text-search on artist + album). LML is the right layer to own the stable handle.
  3. No path to multi-source. Once LML#216 / #217 populate the MusicBrainz/Wikidata legs of the reconciliation log, a rotation row's "what release is this" answer should follow whichever source has data. With discogs_release_id as the only handle, MB-only releases are unreachable from this column.

LML has the release-identity layer ready: LML#526 shipped 2026-06-10 (PR #530, prod DDL applied via discogs-etl#278). POST /api/v1/identity/resolve accepts {kind: 'release', source, external_id} and returns/mints a stable identity_id. This issue consumes that surface.

Desired end state

A new column rotation.lml_identity_id integer (FK semantics live on LML's side; PG-level FK is not enforceable across the network). All new rotation rows from addToRotation (dj-site path) record lml_identity_id synchronously at INSERT time. Tubafrenzy-driven rows (rotation-etl path) record lml_identity_id = NULL at write time; the daily backfill cron populates them with up to ~24h lag. Once resolvable-coverage reaches ≥99% steady-state and consumers have moved over, discogs_release_id is dropped in a follow-up migration.

"Active rotation" throughout this issue means kill_date IS NULL OR kill_date > CURRENT_DATE — the definition used by apps/backend/services/library.service.ts:276 and jobs/rotation-artist-backfill/query.ts.

Write-paths to update

The paste URL itself is entered in tubafrenzy (ROTATION_RELEASE.DISCOGS_RELEASE_ID), not dj-site. Tubafrenzy is on life support per the 2026-05-10 turndown plan with decommission targeted for ~September 2026 — don't add new code there. The WXYC/dj-site#648 MD-editing surface (inline rotation marking from catalog search results, Redux-backed optimistic sync) is cannibalising tubafrenzy's new-paste volume now. Two BS-side write paths consume rotation data and need handling — but with different shapes:

  1. jobs/rotation-etl/job.ts (the only active writer of rotation.discogs_release_id today — see line 102, discogs_release_id: release.discogsReleaseId). rotation-etl does NOT call LML. It writes only discogs_release_id from the tubafrenzy paste. The UPSERT's SET clause for lml_identity_id uses the drift-prevention CASE (see "Drift prevention" below): NULL when discogs_release_id changes, preserve the existing value otherwise. The conflict target is rotation.legacy_rotation_id (line 110). The setWhere gate at lines 145–155 is unchanged — its existing discogs_release_id IS DISTINCT FROM term already fires when the CASE produces a real Y_X → NULL transition, so no extension is needed (and lml_identity_id isn't in the INSERT VALUES tuple, which makes any excluded.lml_identity_id-based gate term TRUE on every tick for any row with a populated identity — see "Drift prevention" for the trace).

    Why no LML in rotation-etl: the useful life of any rotation-etl LML integration is bounded by the tubafrenzy decommission window (~3 months). Investing in lml-client wiring + dual-write logic in code that's about to be retired isn't worth the diff. The rotation-etl lml_identity_id population happens via the daily backfill cron — max ~24h lag for new tubafrenzy-driven rows — acceptable trade-off given that BS#1381's cron (the consumer of lml_identity_id) doesn't ship until coverage gates anyway. The drift CASE is the only rotation-etl change required.

  2. apps/backend/services/library.service.ts:343 (addToRotation), called by POST /library/rotation from dj-site. dj-site posts {album_id, rotation_bin} only — no URL. The function today is a bare db.insert(rotation).values(newRotation).returning() with no ON CONFLICT clause; dj-site-created rows leave legacy_rotation_id NULL so the rotation-etl UPSERT path never collides with them. When album_id is provided, look up library_identity.discogs_release_id WHERE library_id = albumId LIMIT 1 (library_identity.library_id is the PRIMARY KEY per shared/database/src/schema.ts:1391-1393; LIMIT 1 is defensive) and, if non-NULL, synchronously resolve to lml_identity_id via LML before INSERT. The resolve hop blocks the INSERT — the response's row already has both columns populated (or lml_identity_id = NULL on timeout/error).

    When the library_identity lookup returns a non-NULL discogs_release_id, addToRotation also explicitly sets discogs_release_id_source = 'library_identity' — a new enum value introduced by this issue (see "Provenance: new discogs_release_id_source value" below). Without that, the column's NOT NULL DEFAULT 'tubafrenzy_paste' would tag every dj-site row as if it came from tubafrenzy, silently corrupting the provenance invariant.

    This is the per-MD high-volume path post-PR Replication scripts leave orphaned slots on RDS #648: inline rotation marking from catalog rows uses optimistic UI (Redux updates immediately) over a synchronous backend that may take up to 2 s on LML slowness. The 2 s tail produces a slow-confirm path, not a UI rollback — see "Resolve-failure fallback" below.

The tubafrenzy /internal/rotation-webhook handler (apps/backend/routes/internal.route.ts:384) does NOT carry discogs_release_id today — no change needed there.

Duplicate-row semantics (unchanged)

Multiple active rows for the same (album_id, rotation_bin) are allowed by design — shared/database/src/schema.ts:534-538 explicitly forbids a partial-unique index there, per PR #696 and the 2026-04-30 incident, and migration 0072_drop-rotation-active-album-bin-uniq.sql dropped a prior attempt. The librarian invariant permits re-bins, re-adds, and label-driven re-promotes within an album's lifecycle. The new write path doesn't change that: two concurrent addToRotation calls for the same album resolve to the same lml_identity_id (LML's /identity/resolve is idempotent on (source, external_id)) and write two rows, which is the expected behavior.

Resolve-failure fallback (addToRotation only)

Only addToRotation calls LML at write time (per "Write-paths to update" item 1 — rotation-etl is CASE-only, never calls LML). When POST /api/v1/identity/resolve is unavailable or times out at write time inside addToRotation, the INSERT proceeds with lml_identity_id = NULL and only discogs_release_id from the library_identity lookup is populated. The daily backfill cron catches up later. Rationale: blocking the music director on addToRotation (called interactively from dj-site, including the inline rotation-marking surfaces in WXYC/dj-site#648) on an LML outage is worse than a temporarily-incomplete row.

LML-call timeout is bounded by LML_RESOLVE_TIMEOUT_MS env var (default 2000 ms). The hop runs synchronously inside the addToRotation handler — await resolveIdentity(...) before the Drizzle INSERT, in the same Express request — so the response's row already has lml_identity_id populated (or NULL on timeout/error) without needing a follow-up patch. Optimistic UI in WXYC/dj-site#648 covers the latency tail: the row appears in-rotation immediately on click; the slow-confirm path doesn't produce a UI rollback unless the server actually errors.

Sentry counter lml.resolve.fallback_to_null carries two attributes:

  • caller: currently add_to_rotation only; reserved for future writers that might add LML calls

  • reason: timeout | 5xx | 4xx | network | other — classified once at fallback time from the catch block:

    catch (err) {
      const reason =
        err.name === 'AbortError' ? 'timeout' :
        err.response?.status >= 500 ? '5xx' :
        err.response?.status >= 400 ? '4xx' :
        err.code === 'ECONNRESET' || err.code === 'ENOTFOUND' ? 'network' :
        'other';
      Sentry.metrics.count('lml.resolve.fallback_to_null', 1, {
        attributes: { caller: 'add_to_rotation', reason }
      });
      // lml_identity_id stays NULL; INSERT proceeds
    }

    Note the v10 SDK shape: Sentry.metrics.count(name, value, { attributes }). The pre-v8 Sentry.metrics.increment(..., { tags }) surface was removed; the codebase is on @sentry/node@^10.53.1 (see apps/backend/package.json:20).

Each reason signals a different operational response: 5xx → LML deploy regression; timeout → LML latency/capacity; 4xx → BS-side caller bug; network → infra noise. Revisit (split the knob or tune the value) if Sentry shows p95 user-facing add_to_rotation > 2.5 s OR fallback rate > 5% in any reason bucket.

Provenance: new discogs_release_id_source value

discogs_release_id_source (per shared/database/src/schema.ts:524-528, 583-585) is a NOT NULL enum with three values today — all tubafrenzy/operator-side — and a default of 'tubafrenzy_paste':

export const discogsReleaseIdSourceEnum = wxyc_schema.enum('discogs_release_id_source_enum', [
  'tubafrenzy_paste',          // mirrored from tubafrenzy ROTATION_RELEASE by jobs/rotation-etl
  'lml_offline_backfill',      // jobs/rotation-release-id-backfill (BS#1029)
  'discogs_direct_backfill',   // 2026-05-29 bypass-LML operator rescue
]);

addToRotation is a new writer that doesn't fit any of those. If we ship it without extending the enum, the NOT NULL DEFAULT 'tubafrenzy_paste' falls through and every dj-site row gets mislabeled — kill_date forensics, BS#1029 backfill scoping, and the future discogs_release_id retirement migration all rely on this column being honest. Tagging dj-site INSERTs as if they came from tubafrenzy poisons all three.

Fix: add a fourth enum value 'library_identity' and have addToRotation set it explicitly when the library_identity lookup contributes the discogs_release_id.

-- migration
ALTER TYPE wxyc_schema.discogs_release_id_source_enum
  ADD VALUE 'library_identity' AFTER 'discogs_direct_backfill';
// schema.ts — extend the enum + update the doc comment
export const discogsReleaseIdSourceEnum = wxyc_schema.enum('discogs_release_id_source_enum', [
  'tubafrenzy_paste',
  'lml_offline_backfill',
  'discogs_direct_backfill',
  'library_identity',          // BS#1380: addToRotation populated discogs_release_id
                                // from library_identity, then synchronously resolved
                                // to lml_identity_id at INSERT time.
]);
// library.service.ts — addToRotation
if (libraryIdentityDiscogsReleaseId != null) {
  values.discogs_release_id = libraryIdentityDiscogsReleaseId;
  values.discogs_release_id_source = 'library_identity';
  values.lml_identity_id = identityId;  // null on resolve failure (see "Resolve-failure fallback")
}
// If the library_identity lookup returns no row, we leave all three NULL and the
// column's NOT NULL constraint catches the gap at INSERT time — the controller's
// required-field check at library.controller.ts:311 should already gate this,
// but the explicit branch is cheap defense.

Note that the resolve-failure fallback case (LML times out or 5xx's) still writes discogs_release_id_source = 'library_identity': the source of the Discogs ID is library_identity regardless of whether we successfully minted the lml_identity_id. Only lml_identity_id goes NULL on resolve failure.

ALTER TYPE ... ADD VALUE runs fine in Drizzle's transaction-per-migration shape on PG12+ as long as the new value isn't referenced in the same transaction. The follow-on addToRotation writes the new value at runtime, not migration time, so there's no ordering hazard.

Drift prevention: clear lml_identity_id when discogs_release_id changes

The COALESCE pattern from rotation-etl (jobs/rotation-etl/job.ts:124-156) does the right thing on a single column but can leave the cross-column pair inconsistent when discogs_release_id changes mid-life. Concrete trace: row holds (discogs=X, lml=Y_X) where Y_X was minted against X; a tubafrenzy paste-correction contributes a new discogs_release_id = X'. The naive UPSERT without intervention:

  • discogs_release_id ← COALESCE(X', persisted_X) = X' — new value wins, correctly.
  • lml_identity_id ← persisted Y_X — preserved by default.

Row lands at (discogs=X', lml=Y_X), but Y_X was minted against X, not X'. The pair is inconsistent.

Fix: clear lml_identity_id only when the effective discogs_release_id changes; let backfill re-resolve. The "effective" qualifier matters because the existing discogs_release_id SET uses COALESCE(excluded.discogs_release_id, rotation.discogs_release_id) — when tubafrenzy presents NULL (the common case for rotation entries without a paste URL), the persisted value wins and discogs_release_id doesn't change. A CASE that fired on the raw excluded IS DISTINCT FROM rotation would clear lml_identity_id on every tick where excluded.discogs_release_id IS NULL and the persisted value is non-NULL — because in three-valued SQL, NULL IS DISTINCT FROM X is TRUE. Whenever any other field changed (album_id, kill_date, etc.) and let the setWhere gate fire on a row without an upstream URL this tick, the CASE would null out a perfectly-good lml_identity_id. The guard below mirrors the symmetric guard the existing discogs_release_id term already carries at job.ts:153-154:

lml_identity_id = CASE
  WHEN excluded.discogs_release_id IS NOT NULL
    AND excluded.discogs_release_id IS DISTINCT FROM rotation.discogs_release_id
    THEN NULL   -- tubafrenzy supplied a different URL; clear the lml pin
  ELSE rotation.lml_identity_id   -- effective discogs_release_id unchanged; preserve whatever backfill or addToRotation wrote
END

(Equivalent formulation: WHEN COALESCE(excluded.discogs_release_id, rotation.discogs_release_id) IS DISTINCT FROM rotation.discogs_release_id.)

With this, the paste-correction case lands at (X', NULL). The daily backfill cron's predicate WHERE lml_identity_id IS NULL AND discogs_release_id IS NOT NULL picks it up within 24h. No widened predicate, no per-row LML calls for already-correct rows, no drift_corrected counter shape divergence from jobs/rotation-release-id-backfill/orchestrate.ts.

No setWhere extension needed. The legitimate Y_X → NULL transition only happens when excluded.discogs_release_id IS NOT NULL AND IS DISTINCT FROM rotation.discogs_release_id — exactly the condition the existing discogs_release_id term at lines 153-154 already detects, so the gate fires via that term and the CASE clears in the same tick. An excluded.lml_identity_id IS DISTINCT FROM rotation.lml_identity_id term would break the gate: since lml_identity_id isn't in the INSERT VALUES tuple, excluded.lml_identity_id is always NULL, so the term would be TRUE on every tick for any row with a populated identity — turning the gate into a no-op for the rows BS#1059's xmin/CDC discipline most cares about.

The addToRotation INSERT path doesn't face this problem — there's no prior persisted lml_identity_id to drift against; the row is born from the current resolve attempt.

Hardening: addRotation controller allowlist

apps/backend/controllers/library.controller.ts:309-317 validates only that album_id and rotation_bin are present, then passes the entire req.body (typed NewRotationRelease) to libraryService.addToRotation, which executes db.insert(rotation).values(newRotation) (library.service.ts:343-346). A client can include discogs_release_id or (post this issue) lml_identity_id directly in the request body, and Drizzle inserts the verbatim values — bypassing both the tubafrenzy-paste provenance invariant and the server-side LML resolve this issue introduces.

The dual-write ships safely without this fix: an authenticated DJ/music-director can already corrupt their own rotation row's metadata via tubafrenzy mis-paste, and direct-body injection is the same harm class. But adding lml_identity_id to the list of corruptable fields is a natural moment to close the loophole, and the codebase already has the right pattern: apps/backend/controllers/flowsheet.controller.ts:299-313 defines pickUpdateEntryFields() as a signature-typed allowlist (builds a fresh object containing only known-good keys; everything else is silently dropped).

Add pickAddRotationFields(body): Pick<NewRotationRelease, 'album_id'|'rotation_bin'> in controllers/library.controller.ts, mirroring the flowsheet pattern. The handler at library.controller.ts:309-317 calls it before libraryService.addToRotation(picked). All other rotation columns are either server-derived (legacy_rotation_id, legacy_library_release_id, discogs_release_id, discogs_release_id_source, lml_identity_id, tracklist_lookup_attempted_at, kill_date) or — for the free-text snapshot columns artist_name, album_title, record_label and the add_date default — only set by the tubafrenzy ETL path, which doesn't touch this controller. They must never be client-supplied through this endpoint.

Allowlist scope matches the only known caller. dj-site's RotationParams (lib/features/rotation/types.ts:14-17) is exactly { album_id, rotation_bin }; the existing handler's required-fields check at library.controller.ts:311 rejects requests missing either. Permitting only those two through pickAddRotationFields reflects today's call shape. If a future caller legitimately needs add_date (operator backfill, etc.), widen the signature then — the typecheck-rejected-by-default property of the allowlist is the whole point.

Phrasing this as an allowlist (signature-typed accept list) rather than a denylist is deliberate: future column additions to rotation are implicitly rejected by typecheck until explicitly added to the signature. Lands in the same PR as the dual-write changes — defense-in-depth alongside the new column, not a blocker.

Where

  • shared/database/src/schema.ts — (a) add lml_identity_id to the rotation table; column comment references entity.release_identity.id. (b) extend discogsReleaseIdSourceEnum with 'library_identity'; update the enum doc comment at schema.ts:494-528 to document the fourth value's writer (addToRotation, BS#1380) alongside the existing three.
  • shared/database/src/migrations/ — new migration that (a) ALTER TABLE rotation ADD COLUMN lml_identity_id integer NULL and (b) ALTER TYPE wxyc_schema.discogs_release_id_source_enum ADD VALUE 'library_identity' AFTER 'discogs_direct_backfill'. No new index. Rotation is hundreds of active rows; the cron's predicate (jobs/rotation-artist-backfill/query.ts:23-32: (kill_date IS NULL OR kill_date > CURRENT_DATE) AND discogs_release_id IS NOT NULL) is seqscan-fine today, and the swap to lml_identity_id IS NOT NULL has the same selectivity. Schema comment at schema.ts:546-548 also reminds that constraints must accept the full upstream shape from tubafrenzy.
  • apps/backend/services/library.service.tsaddToRotation does the library_identity.discogs_release_id lookup, synchronously awaits resolveIdentity(...) (up to LML_RESOLVE_TIMEOUT_MS), and triple-writes (discogs_release_id, discogs_release_id_source = 'library_identity', lml_identity_id) at INSERT (the third write is the new provenance value — see "Provenance: new discogs_release_id_source value" above).
  • apps/backend/controllers/library.controller.ts:309-317pickAddRotationFields() allowlist.
  • shared/lml-client/ (or wherever the existing LML wrappers live) — surface a resolveIdentity({kind, source, external_id}) wrapper for POST /api/v1/identity/resolve with an AbortController honoring LML_RESOLVE_TIMEOUT_MS. Consumed by addToRotation and the backfill cron. Not consumed by jobs/rotation-etl/.
  • jobs/rotation-etl/job.ts — drift-prevention CASE for lml_identity_id in the UPSERT SET clause (see "Drift prevention"). setWhere gate at lines 145–155 is unchanged. No lml-client integration in this job.
  • jobs/rotation-lml-identity-backfill/ (new job) — populate lml_identity_id from discogs_release_id per row via POST /api/v1/identity/resolve for rows where lml_identity_id IS NULL AND discogs_release_id IS NOT NULL. Idempotent; safe to re-run. Reuse the counter shape from jobs/rotation-release-id-backfill/orchestrate.ts:30-91 (scanned/resolved/unresolved/lml_error/raced).
    • Daily cron: default schedule 0 6 * * * UTC (02:00 ET) in package.json cron-schedule, overridable per-deploy via the BACKFILL_CRON_SCHEDULE GHA repo variable. Mirrors flowsheet-metadata-backfill's recurring-drift-repair shape (the right precedent for "catch outage-induced NULLs automatically" vs. BS#1029's one-shot "rerun after Discogs catalog improves" shape). Orchestrator cooperative-pause pattern from BS#735 defers ticks when DJs are active.
    • --report mode: when invoked with --report, emits the resolvable-coverage SQL result (see "Coverage gate" below) and exits without running the resolve loop. For ops queries against BS#1381's unblock condition.
    • One-shot mode supported for initial migration deploy and ad-hoc operator runs (via docker run --rm --env-file .env $AWS_ECR_URI/rotation-lml-identity-backfill:latest per the BS#1029 invocation precedent).

Coverage gate (BS#1381 unblock)

The gate that unblocks BS#1381 is resolvable coverage — the fraction of active rotation rows where backfill could populate lml_identity_id that actually have it. Denominator excludes rows with NULL discogs_release_id (no backfill source).

SELECT
  COUNT(*) FILTER (WHERE kill_date IS NULL OR kill_date > CURRENT_DATE) AS active,
  COUNT(*) FILTER (WHERE (kill_date IS NULL OR kill_date > CURRENT_DATE)
                   AND discogs_release_id IS NOT NULL) AS active_with_discogs,
  COUNT(*) FILTER (WHERE (kill_date IS NULL OR kill_date > CURRENT_DATE)
                   AND discogs_release_id IS NOT NULL
                   AND lml_identity_id IS NOT NULL) AS active_with_lml,
  ROUND(
    100.0 * COUNT(*) FILTER (WHERE (kill_date IS NULL OR kill_date > CURRENT_DATE)
                             AND discogs_release_id IS NOT NULL
                             AND lml_identity_id IS NOT NULL)
        / NULLIF(COUNT(*) FILTER (WHERE (kill_date IS NULL OR kill_date > CURRENT_DATE)
                                  AND discogs_release_id IS NOT NULL), 0),
    2
  ) AS resolvable_coverage_pct
FROM wxyc_schema.rotation;

When resolvable_coverage_pct ≥ 99.0 steady-state (i.e., the metric stays at or above 99 across consecutive daily cron runs without trending down), comment on BS#1381 to unblock the cron migration. Rows that sit at lml_identity_id IS NULL AND discogs_release_id IS NOT NULL for >7 days after backfill is healthy indicate either LML can't resolve the ID (catalog drift) or backfill has a bug — investigate per-row, not by lowering the gate threshold.

Why resolvable coverage instead of absolute: post-tubafrenzy-decommission, new addToRotation rows pick up discogs_release_id from library_identity, which has its own incomplete subset. Absolute coverage (active_with_lml / active) conflates "backfill is done" with "library_identity has full Discogs handle population" — two unrelated work tracks. PR #648's high-volume MD path makes the denominator in absolute coverage move with library_identity health, not backfill progress.

Constraints

Acceptance criteria

  • LML precursor WXYC/library-metadata-lookup#526 merged (PR #530) and prod DDL applied (discogs-etl#278).
  • Migration adds lml_identity_id integer NULL to wxyc_schema.rotation. No new index. Column comment references entity.release_identity.id.
  • Migration extends discogs_release_id_source_enum with 'library_identity' (ALTER TYPE ... ADD VALUE 'library_identity' AFTER 'discogs_direct_backfill') and updates the schema.ts:494-528 doc comment to document the fourth value alongside the existing three. addToRotation explicitly writes discogs_release_id_source = 'library_identity' whenever it populates discogs_release_id from library_identity (including the resolve-failure fallback case, where lml_identity_id goes NULL but the source-of-the-Discogs-id is still library_identity). Integration tests assert (a) a successful addToRotation lands (discogs_release_id_source = 'library_identity', lml_identity_id NOT NULL); (b) the resolve-failure path lands (discogs_release_id_source = 'library_identity', lml_identity_id IS NULL); (c) a parallel rotation-etl tick on the same row does NOT rewrite the source to 'tubafrenzy_paste' unless tubafrenzy supplied a non-NULL discogs_release_id that differs from the persisted one (existing rotation-etl CASE behavior, untouched by this issue).
  • shared/lml-client/ wrapper resolveIdentity({kind, source, external_id}) added, honoring LML_RESOLVE_TIMEOUT_MS via AbortController. Consumed by addToRotation and the backfill cron only.
  • jobs/rotation-etl/job.ts UPSERT clears lml_identity_id via the guarded CASE clause when the effective discogs_release_id changes (see "Drift prevention" section above; both halves of the excluded.discogs_release_id IS NOT NULL AND … IS DISTINCT FROM … guard required). No LML resolve call in this job. Integration tests cover (a) the paste-correction scenario — pre-existing (discogs=X, lml=Y_X) row + new tubafrenzy paste of discogs_release_id=X' — verifying the row lands at (X', NULL); AND (b) the NULL-upstream-with-other-change scenario — pre-existing (discogs=X, lml=Y_X) row + tubafrenzy tick with NULL discogs_release_id but a changed kill_date — verifying the row lands at (discogs=X, lml=Y_X) with only kill_date mutated (regression test for the unguarded CASE).
  • setWhere gate at job.ts:145-155 is unchanged. (Adding an excluded.lml_identity_id IS DISTINCT FROM rotation.lml_identity_id term would break the gate — see "Drift prevention" for why.)
  • addToRotation does the library_identity.discogs_release_id lookup, synchronously awaits resolveIdentity(...) before INSERT, and dual-writes. INSERT response carries the populated row (or lml_identity_id = NULL on timeout/error). Integration tests exercise the fallback path (simulated LML outage), not just assert it in prose.
  • Sentry counter lml.resolve.fallback_to_null tagged with caller (currently add_to_rotation) and reason (timeout|5xx|4xx|network|other); classification happens once at fallback time in the catch block. Tag values verified by unit test exercising each branch. Revisit (split the knob or tune the value) if Sentry shows p95 user-facing add-to-rotation > 2.5 s OR fallback rate > 5% in any reason bucket.
  • addRotation controller uses signature-typed pickAddRotationFields() allowlist accepting only {album_id, rotation_bin} (the exact shape of dj-site's RotationParams), mirroring flowsheet.controller.ts:299-313's pickUpdateEntryFields() pattern. Test: a request body that includes any non-allowlisted field — server-derived (legacy_rotation_id, legacy_library_release_id, discogs_release_id, discogs_release_id_source, lml_identity_id, tracklist_lookup_attempted_at, kill_date) or ETL-only (add_date, artist_name, album_title, record_label) — lands a row whose values for those fields are server-set (NULL, or derived from the library lookup / column default), not client-supplied.
  • jobs/rotation-lml-identity-backfill/ job created. Backfill populates lml_identity_id for rows where lml_identity_id IS NULL AND discogs_release_id IS NOT NULL. Idempotent; safe to re-run. Counter shape mirrors jobs/rotation-release-id-backfill/orchestrate.ts:30-91 (scanned/resolved/unresolved/lml_error/raced).
  • Backfill registered as a daily cron with default schedule 0 6 * * * UTC in package.json cron-schedule, overridable via the BACKFILL_CRON_SCHEDULE GHA repo variable. Cooperative-pause pattern integrates with BS#735's orchestrator. One-shot invocation supported per BS#1029's docker run --rm --env-file .env $AWS_ECR_URI/rotation-lml-identity-backfill:latest shape.
  • Backfill --report mode emits the resolvable-coverage SQL result (see "Coverage gate" section) and exits without running the resolve loop.
  • Backfill telemetry on Sentry: rows processed, identities minted, resolve failures (count + rate). Numeric attributes set at span creation via Sentry.startSpan({ name, op, attributes }) — never via late setAttribute (per BS#1081: late setAttribute calls index numbers as strings and break sum/avg/p95).
  • Coverage signal: SQL in "Coverage gate" section emits resolvable_coverage_pct. When ≥99.0 steady-state across consecutive daily cron runs, comment on BS#1381 to unblock that work.
  • Integration tests for the addToRotation write path: identity resolved + same URL on a subsequent paste returns the same identity_id (idempotent on LML side, verified end-to-end). Optimistic-UI compatibility verified — the response shape from addToRotation is unchanged by the LML hop.
  • Docs updated in apps/backend/docs/ or wherever rotation-schema notes live. jobs/rotation-lml-identity-backfill/README.md mirrors BS#1029's README structure with the schedule, env vars, counter shape, and post-run verification SQL.

Out of scope

  • Dropping discogs_release_id. Future migration after lml_identity_id coverage stabilises and all consumers (post-BS#1381) read from it.
  • Migrating library.canonical_entity_id (closed in #624) to also use lml_identity_id. Separate concern.
  • Cron migration in PR feat(rotation-artist-backfill): daily cron to refresh LML artist rows for active rotation #1376's job — that's BS#1381.
  • Adding a Discogs paste URL flow to dj-site or addToRotation's request shape. Library_identity lookup is the only path on the dj-site side; tubafrenzy paste stays primary for new pastes until decommission.
  • Full dual-write in rotation-etl (lml-client integration in the ETL job). Considered and rejected — useful life is bounded by tubafrenzy decommission ~September 2026; backfill picks up tubafrenzy-driven rows within ~24h.
  • LML-side identity merges. If LML later merges identity X into Y (via the reconciliation pipeline), BS rows pinned to X become dangling pointers. v1 doesn't address this; future work will need a reconciliation pass or a server-side rewrite-on-merge hook.
  • Per-row resolve-failed timestamp / cooldown. Permanently-unresolvable rows (e.g., Discogs catalog drift made the ID 404) get retried every backfill tick. At ~50 such rows × daily cron, the wasted LML traffic is bounded; investigate as a follow-up if the backfill counter's lml_error stays elevated.

Suggested approach

  1. Migration adds the column (lml_identity_id integer NULL, no index) AND extends the discogs_release_id_source_enum with 'library_identity' (ALTER TYPE ... ADD VALUE). One PR. Update the schema-comment block at schema.ts:494-528 in the same PR.
  2. shared/lml-client.resolveIdentity wrapper lands (small PR, lets addToRotation and the backfill consume it cleanly).
  3. rotation-etl extended with the drift-prevention CASE only. setWhere gate unchanged. No lml-client integration.
  4. addToRotation extended with the library_identity → resolveIdentity → dual-write path and the pickAddRotationFields() allowlist (same PR).
  5. jobs/rotation-lml-identity-backfill/ job created with one-shot + daily-cron support, --report mode, and Sentry counters.
  6. Initial backfill run; daily cron picks up incremental drift thereafter.
  7. Signal BS#1381 once resolvable_coverage_pct ≥ 99.0 is steady-state.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    cross-cache-identityProject tag for the cross-cache-identity initiative (library hook + identity record + normalization)enhancementNew feature or requestlmlTouches library-metadata-lookupmigrationDatabase migration issue

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions