Skip to content

fix(schema): partial unique on track_title IS NULL to block CTA null-track duplicates#1414

Merged
jakebromberg merged 1 commit into
mainfrom
fix/1135-cta-unique-nulls
Jun 14, 2026
Merged

fix(schema): partial unique on track_title IS NULL to block CTA null-track duplicates#1414
jakebromberg merged 1 commit into
mainfrom
fix/1135-cta-unique-nulls

Conversation

@jakebromberg

@jakebromberg jakebromberg commented Jun 14, 2026

Copy link
Copy Markdown
Member

Closes #1135

Problem

cta_unique_idx on wxyc_schema.compilation_track_artist (library_id, artist_name, track_title) was declared in 0037 without any NULL-equality coercion. Postgres treats NULLs as distinct in unique B-tree comparisons by default, so two rows sharing the same (library_id, artist_name) and NULL track_title could coexist — exactly what the constraint name implies it should prevent. track_title stays nullable; ETL writes with no track title are legitimate domain state.

Approach (pivot from NULLS NOT DISTINCT)

The first attempt at this PR used CREATE UNIQUE INDEX … NULLS NOT DISTINCT. The migrate-dryrun CI job revealed the issue body's "dev/CI/prod on PG 18" claim referred to the docker-compose pin, not the prod runtime — prod RDS runs PostgreSQL 14.22, where NULLS NOT DISTINCT is a syntax error (PG15+ feature). The first push failed the dryrun with 42601 syntax error at or near "NULLS".

Pivot to the task body's suggested PG14-compatible alternative: a complementary partial unique index restricted to the track_title IS NULL slice.

  • Base 0037 cta_unique_idx continues to enforce uniqueness on the non-NULL slice via standard Postgres semantics (non-NULL rows compare normally).
  • New cta_unique_null_track_idx is a partial unique on (library_id, artist_name) WHERE track_title IS NULL, closing the NULL bucket.
  • Together they enforce the full intended "no duplicate compilation tracks per (library, artist, title)" semantics on PG 14.
  • Forward-compatible with a future PG 15+ upgrade: a follow-up migration could drop the partial and rebuild the base with NULLS NOT DISTINCT if we want to consolidate.

Migration is purely additive — no DROP, no rebuild of the base index, no lock on existing rows beyond the small NULL-slice build window. The precondition guard (issue #705, same shape as 0071) counts duplicate (library_id, artist_name) groups with NULL track_title and fails fast if any remain. A 2026-06-13 prod audit returned 0 such groups, so the rebuild applies cleanly; CI's dryrun against the prod snapshot confirms.

Schema parity

schema.ts declares both indexes so drizzle-kit drift detection covers them:

uniqueIndex('cta_unique_idx').on(table.library_id, table.artist_name, table.track_title),
uniqueIndex('cta_unique_null_track_idx')
  .on(table.library_id, table.artist_name)
  .where(sql`${table.track_title} IS NULL`),

The 0094 snapshot mirrors the schema with the qualified WHERE convention used since 0083.

Tests

  • tests/unit/database/schema.cta-unique-null-track-partial.test.ts — schema-source drift guard. Asserts the migration creates the partial unique with the right predicate on (library_id, artist_name), carries the precondition guard, doesn't touch the base cta_unique_idx, doesn't use CONCURRENTLY or NULLS NOT DISTINCT, and matches the schema.ts declaration. Mirrors schema.flowsheet-album-id-enriched-idx.test.ts.
  • tests/integration/cta-unique-null-track-partial.spec.js — runtime behavior. Two NULL-track inserts → second is rejected with SQLSTATE 23505 + cta_unique_null_track_idx constraint name. A NULL-track row coexists with a non-NULL-track row for the same (library_id, artist_name) (the partial only collapses NULLs).

Production ops

  • Additive only: the 0037 cta_unique_idx stays in place exactly as is.
  • CREATE UNIQUE INDEX is NOT CONCURRENTLY because drizzle wraps each migration in a transaction. The NULL-track slice is small (sub-second build); if it ever grows materially the runbook is to build the partial out-of-band with CONCURRENTLY first, then add IF NOT EXISTS to a follow-up migration.

CI

All jobs green on the latest push:

  • detect-changes, lint-and-typecheck, unit-tests, Migration Dry-Run (prod-shaped data), Integration-Tests

@github-actions

Copy link
Copy Markdown

Schema constraint shape report

data-shape report errored (exit 0): node:internal/modules/runmain:107 triggerUncaughtException( ^ Error ERRMODULENOTFOUND: Cannot find package 'postgres' imported from /home/runner/work/Backend-Service/Backend-Service/scripts/schema-shape-report.mjs Did you mean to import "postgres/cjs/src/index.js"? at Object.getPackageJ; manual check required

@jakebromberg jakebromberg changed the title fix(schema): make cta_unique_idx NULLS NOT DISTINCT to block null-track duplicates fix(schema): partial unique on track_title IS NULL to block CTA null-track duplicates Jun 14, 2026
…ack duplicates

Postgres treats NULLs as distinct in unique B-tree comparisons by default, so the original 0037 `cta_unique_idx` on `compilation_track_artist (library_id, artist_name, track_title)` never blocked the legitimate-NULL-track duplicate case. Prod RDS runs PostgreSQL 14.22, which doesn't have the PG15+ `NULLS NOT DISTINCT` modifier, so the fix is a partial unique index restricted to the `track_title IS NULL` slice.

The base `cta_unique_idx` keeps enforcing uniqueness on the non-NULL slice via standard Postgres semantics (non-NULL rows are compared normally); the new `cta_unique_null_track_idx` covers the NULL slice on `(library_id, artist_name)`. Together they enforce the full intended semantics on PG 14, and the migration is purely additive — no DROP, no rebuild, no lock on existing rows beyond the small NULL-slice build window.

schema.ts declares both indexes so drizzle-kit drift detection covers them. `track_title` stays nullable — "no track title" is a real domain state, not an empty string, and a sentinel would force read paths to translate between presentation NULL and storage ''.

The migration carries a precondition guard (issue #705) that counts duplicate `(library_id, artist_name)` groups with NULL `track_title` and fails fast if any remain — a 2026-06-13 prod audit found 0, so the rebuild applies cleanly. Same shape as 0071.

A unit test (tests/unit/database/schema.cta-unique-null-track-partial.test.ts) is the cross-source drift guard; an integration spec (tests/integration/cta-unique-null-track-partial.spec.js) verifies that the second NULL-track insert is rejected with SQLSTATE 23505 and that distinct non-NULL titles still coexist.

Closes #1135
@jakebromberg jakebromberg force-pushed the fix/1135-cta-unique-nulls branch from e4edfe3 to f84fa59 Compare June 14, 2026 23:16
@jakebromberg jakebromberg merged commit a4d0ab1 into main Jun 14, 2026
6 checks passed
@jakebromberg jakebromberg deleted the fix/1135-cta-unique-nulls branch June 14, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cta_unique_idx allows duplicate (library_id, artist_name) when track_title NULL

1 participant