refactor(rotation-artist-backfill): switch to lml_identity_id fan-out; drop Discogs-specific carving

## Problem

`jobs/rotation-artist-backfill/` (landed in [PR #1376](https://github.com/WXYC/Backend-Service/pull/1376), closes [BS#1361](https://github.com/WXYC/Backend-Service/issues/1361)) is structurally Discogs-specific. It:

1. Reads `rotation.discogs_release_id` from the rotation table.
2. Calls `GET /api/v1/discogs/release/{id}` per row.
3. Walks `release.artists[*].artist_id` (Discogs artist IDs).
4. Calls `GET /api/v1/discogs/artist/{id}` per artist.
5. Defends against Discogs's id=0 sentinel locally via `isValidArtistId`.

That carving was correct given today's world (BS holds Discogs IDs; LML's only refresh-by-ID surface is Discogs-namespaced). It's wrong altitude once [WXYC/library-metadata-lookup#525](https://github.com/WXYC/library-metadata-lookup/issues/525) lands a source-agnostic refresh endpoint and [WXYC/Backend-Service#1380](https://github.com/WXYC/Backend-Service/issues/1380) populates `rotation.lml_identity_id`.

## Desired end state

The cron walks `rotation.lml_identity_id` instead of `rotation.discogs_release_id`, batches identity_ids, and calls `POST /api/v1/cache/refresh-for-identities` once per batch (or per concurrency slot). LML handles the source fan-out *and* the walk-to-artists internally. BS-side code loses:

- The `extractPhase1ArtistIds` projection (LML knows the artist credits and walks to them itself).
- The `isValidArtistId` sentinel defense (obsolete once this cron stops calling `/discogs/artist/{id}` directly — the defense lives at the caller per [LML#518](https://github.com/WXYC/library-metadata-lookup/issues/518), and removing the caller removes the need; LML#546 closes residual caller-side gaps elsewhere in LML).
- The `getRelease` / `getArtistDetails` wrappers (replaced by a single `refreshForIdentities` call).
- The Discogs-flavored comments in `lml-fetch.ts` (no longer accurate at this layer).

The deploy gate (currently gates on LML's `commit_sha` being a descendant of `3e54907`) is updated to gate on whichever commit ships LML#525 instead. The `fetched_at` discriminator from [LML#503](https://github.com/WXYC/library-metadata-lookup/pull/503) — including its bulk projection — is still load-bearing under the hood, but BS doesn't gate on it directly; it gates on the existence of the new endpoint.

## Where

- `jobs/rotation-artist-backfill/query.ts` — SELECT changes from `discogs_release_id` to `lml_identity_id`. Skip rows where `lml_identity_id IS NULL`.
- `jobs/rotation-artist-backfill/lml-fetch.ts` — replace `fetchRelease` / `fetchArtist` / `extractPhase1ArtistIds` with `refreshForIdentities(ids: number[])`.
- `jobs/rotation-artist-backfill/orchestrate.ts` — collapse the two-tier loop into a one-tier batched call. Counters change shape (`releases_*` and `artists_*` become `identities_*` plus `warmed_releases` / `warmed_artists`, both derived from per-source `release_outcome == "success"` / artist `outcome == "success"` in the LML response). Cache hit/miss visibility is intentionally not surfaced as a BS-side counter — see the "Cache hit/miss visibility" note under "Sentry totals" below.
- `jobs/rotation-artist-backfill/deploy-guard.ts` — update `GATE_COMMIT_SHA` to whatever ships LML#525.
- `jobs/rotation-artist-backfill/README.md` — full rewrite of the mechanism section. Drop the "Discogs cache" narrative; describe the identity-based refresh.
- `shared/lml-client/src/index.ts` — add `refreshForIdentities(identityIds: number[])` wrapper.
- `tests/unit/jobs/rotation-artist-backfill/` — regenerate tests against the new shape.

## Rate-limit recalibration (pre-merge gate)

The new endpoint fans out internally, so one BS-side call can spawn N Discogs calls. The `BACKFILL_LML_*` numbers need recalibrating before this PR can merge:

- Today: BS at 20 req/min, each call hits Discogs ~once-and-fan-out-to-artists. Effective Discogs req/min ~80 (one release + 3-4 artists).
- Tomorrow: BS at X req/min, each call multiplexes M identities through LML, each hitting Discogs once-per-release + N-per-release-artists.
- Constraint: stay under LML's `discogs_rate_limit=50` per replica, **including all concurrent LML callers** (runtime enrichment via `lookupMetadata`, picker resolution via `getRotationTracksFromRelease`, this cron). All callers share the same per-replica rate-limit budget. `BACKFILL_LML_RPS` should leave headroom — a reasonable starting target is ~30/min for the cron alone, leaving ~20/min for runtime callers. Verify the runtime steady-state Discogs req/min in Sentry before sizing the cron's budget.

Concretely: before merging, run a staging load test with the planned `BACKFILL_LML_RPS` and `BACKFILL_LML_BATCH_SIZE` AND a simulated runtime-caller baseline, and verify the Discogs req/min observed at LML stays under cap. Update the README runbook with the chosen numbers and the load-test methodology.

**`BACKFILL_LML_BATCH_SIZE` is a hard constant, not a tunable.** LML#525 caps `POST /api/v1/cache/refresh-for-identities` at 50 ids/request (400-on-overflow). The cap is derived from Discogs rate-limit × cold-cache fan-out ≤ Railway's request-timeout ceiling — recalibration only lands alongside an ingress change. BS should encode the value as `const LML_REFRESH_BATCH_CAP = 50` in `jobs/rotation-artist-backfill/orchestrate.ts` and reference that constant rather than reading an env var. If the constant is kept as a tunable knob for one-shot operator flexibility, the startup must assert `batchSize <= LML_REFRESH_BATCH_CAP` and refuse to run otherwise — the runtime 400 from LML would otherwise show up only after the first batch.

## Dashboard migration follow-up

The existing Sentry dashboards built on `backfill.releases_*` / `backfill.artists_*` will break the moment this cron switches shape. Filed as [WXYC/Backend-Service#1402](https://github.com/WXYC/Backend-Service/issues/1402) — that issue covers the inventory + cutover of saved discover queries, alert rules, and team-shared dashboards keyed on the old attribute names. The dashboard migration must complete *before* this PR merges, so on-call doesn't lose visibility during the cutover window. BS#1402 is also wired as a `blocked-by` edge on this issue so the GitHub UI surfaces the dependency.

## Constraints

- **Don't run both crons in parallel.** Once this migration lands, the old `discogs_release_id`-based path is removed in the same PR (after BS#1380 backfill is ≥99% complete). Avoid drift between counters and dashboards.
- **Pre-merge rate-limit verification.** See above — load-tested numbers required before merge.
- **Dashboard cutover.** Sibling dashboard-migration issue (see above) must merge before this PR. The metric-name change is breaking for any alerts that key on the old names.
- **Coordinated deploy.** BS#1380's backfill must hit ≥99% coverage on active rotation BEFORE this PR ships.

## Acceptance criteria

- [ ] LML#525 merged and deployed to production.
- [ ] BS#1380 backfill at ≥99% coverage on active rotation rows.
- [ ] Sibling dashboard-migration issue [WXYC/Backend-Service#1402](https://github.com/WXYC/Backend-Service/issues/1402) closed before this PR merges.
- [ ] **Graph-level `blocked-by` edges wired via the GitHub dependencies API** (not just prose): this issue blocked-by LML#525, BS#1380, and BS#1402. Required so the GitHub UI surfaces the blocked state when the queue is walked.
- [ ] `query.ts` SELECTs `lml_identity_id` (and skips rows where it's NULL).
- [ ] `orchestrate.ts` is a one-tier loop batching identity_ids into `refreshForIdentities` calls.
- [ ] Sentry totals span attribute set updated, derived from per-id `CacheRefreshResultItem` results in the LML response:
  - `backfill.identities_scanned` — total identity_ids the cron walked
  - `backfill.identities_resolved` — count where per-id `status != "error"` (any useful work happened)
  - `backfill.warmed_releases` — sum across batches of per-source `release_outcome == "success"` (cache hit + fresh fetch + tombstone all count, per LML#525's "Per-id result semantics")
  - `backfill.warmed_artists` — sum across batches of per-source artist `outcome == "success"`
  - `backfill.not_found` — count where per-id `status == "not_found"` (BS holds stale `lml_identity_id` references)
  - `backfill.not_implemented` — count where per-id `status == "not_implemented"` (sources LML doesn't support yet)
  - `backfill.lml_error` — batch-level call failures (BS side) + count where per-id `status == "error"`

  **Cache hit/miss visibility:** the original AC had per-source `warmed_hits` / `warmed_misses` buckets. Dropped — LML#525's `BulkCacheRefreshResponse` (`lookup/models.py:100-101` parallel) deliberately omits cache_stats from the response body, and LML's internal recorder methods (`record_pg_cache_hit/miss`, `record_memory_cache_hit`, `record_api_call` per `discogs/fallthrough.py:311,317`, `discogs/memory_cache.py:257`, `discogs/service.py:*`) are aggregate, not per-source-labeled. Hit/miss observability lives on LML-side Sentry dashboards via `_project_cache_stats_to_transaction` (`lookup/router.py:472` parallel) — BS-side counters track warming success, not cache pressure. If future cost/SLO work needs the breakdown surfaced to BS, that's a per-source recorder refactor on the LML side, not a BS-side parsing change.
- [ ] All numeric attributes set via `Sentry.startSpan({ name, op, attributes: {...} })` at span creation time — never via late `setAttribute`, per [BS#1081](https://github.com/WXYC/Backend-Service/issues/1081) (late `setAttribute` indexes numbers as strings and breaks `avg()` / `p95()` aggregation).
- [ ] Sentry alert: `backfill.not_found / backfill.identities_scanned > 1%` rolling 24h ⇒ BS holds stale `lml_identity_id` references (the corruption signal — invisible without this counter, because today's `errors` bucket lumps it in with transient failures). Configured via the BS#1402 sibling.
- [ ] Deploy gate updated to the new LML#525 anchor commit.
- [ ] Unit tests rewritten; integration smoke test against staging LML.
- [ ] Pre-merge load test confirms Discogs req/min at LML stays under `discogs_rate_limit=50` per replica, **including a simulated runtime-caller baseline.**
- [ ] Batch size encoded as `const LML_REFRESH_BATCH_CAP = 50` in `jobs/rotation-artist-backfill/orchestrate.ts`, matching LML#525's per-request cap. If retained as an env-overridable knob (`BACKFILL_LML_BATCH_SIZE`), the cron asserts `batchSize <= LML_REFRESH_BATCH_CAP` at startup and refuses to run otherwise. README documents the derivation chain (Discogs rate-limit × cold-cache fan-out ≤ Railway request timeout) so future operators don't try to raise it without recalibrating LML's ingress.
- [ ] README narrative section rewritten; "Known caveat" section on timeoutMs gap reviewed (the new endpoint may surface a different timeout shape).
- [ ] PR description references the merged dashboard-migration PR (from BS#1402).
- [ ] Old Discogs-specific helpers (`isValidArtistId`, `extractPhase1ArtistIds`, `fetchRelease`, `fetchArtist`) deleted.

## Suggested approach

1. Wait for LML#525 (which is blocked on the LML precursor) to land and deploy.
2. Wait for BS#1380's backfill to reach ≥99% steady state.
3. Complete [BS#1402](https://github.com/WXYC/Backend-Service/issues/1402) (dashboard migration) first.
4. Branch + worktree per global Git workflow.
5. Update `lml-client` first (new endpoint wrapper). Land separately if scope is too large.
6. Refactor cron. TDD per project CLAUDE.md.
7. Run staging load test before opening this PR. Numbers go into the README runbook.
8. Coordinated deploy: BS#1380's backfill must hit ≥99% coverage on active rotation BEFORE this PR merges.

## Related

- [WXYC/library-metadata-lookup#525](https://github.com/WXYC/library-metadata-lookup/issues/525) — the LML endpoint this cron consumes. **Blocker.**
- [WXYC/library-metadata-lookup#526](https://github.com/WXYC/library-metadata-lookup/issues/526) — LML precursor (release-identity schema). Transitive blocker via LML#525.
- [WXYC/Backend-Service#1380](https://github.com/WXYC/Backend-Service/issues/1380) — the BS schema migration that populates `lml_identity_id`. **Blocker.**
- [WXYC/Backend-Service#1402](https://github.com/WXYC/Backend-Service/issues/1402) — Sentry dashboard / alert cutover for the metric rename. **Blocker.**
- [WXYC/Backend-Service#1081](https://github.com/WXYC/Backend-Service/issues/1081) — Sentry attribute typing convention (numeric attributes at span creation, not via `setAttribute`).
- [WXYC/library-metadata-lookup#518](https://github.com/WXYC/library-metadata-lookup/issues/518) — `id <= 0` defense at caller, not router (closed; informs the `isValidArtistId` removal here).
- [WXYC/library-metadata-lookup#546](https://github.com/WXYC/library-metadata-lookup/issues/546) — closes residual caller-side `id <= 0` gaps elsewhere in LML.
- [WXYC/Backend-Service#1361](https://github.com/WXYC/Backend-Service/issues/1361) — original BS-side need.
- [WXYC/Backend-Service/pull/1376](https://github.com/WXYC/Backend-Service/pull/1376) — current implementation this issue replaces.
- [WXYC/library-metadata-lookup#503](https://github.com/WXYC/library-metadata-lookup/pull/503) — `fetched_at` discriminator (load-bearing under LML#525). Bulk projection reused by the multiplexer.
- [WXYC/library-metadata-lookup#510](https://github.com/WXYC/library-metadata-lookup/issues/510) — 404 tombstone. Still load-bearing.
- [WXYC/library-metadata-lookup#207](https://github.com/WXYC/library-metadata-lookup/issues/207) — reconciliation pipeline parent.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(rotation-artist-backfill): switch to lml_identity_id fan-out; drop Discogs-specific carving #1381

Problem

Desired end state

Where

Rate-limit recalibration (pre-merge gate)

Dashboard migration follow-up

Constraints

Acceptance criteria

Suggested approach

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

refactor(rotation-artist-backfill): switch to lml_identity_id fan-out; drop Discogs-specific carving #1381

Description

Problem

Desired end state

Where

Rate-limit recalibration (pre-merge gate)

Dashboard migration follow-up

Constraints

Acceptance criteria

Suggested approach

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions