fix(schema): partial unique on track_title IS NULL to block CTA null-track duplicates#1414
Merged
Merged
Conversation
Schema constraint shape reportdata-shape report errored (exit 0): node:internal/modules/runmain:107 triggerUncaughtException( ^ Error ERRMODULENOTFOUND: Cannot find package 'postgres' imported from /home/runner/work/Backend-Service/Backend-Service/scripts/schema-shape-report.mjs Did you mean to import "postgres/cjs/src/index.js"? at Object.getPackageJ; manual check required |
…ack duplicates Postgres treats NULLs as distinct in unique B-tree comparisons by default, so the original 0037 `cta_unique_idx` on `compilation_track_artist (library_id, artist_name, track_title)` never blocked the legitimate-NULL-track duplicate case. Prod RDS runs PostgreSQL 14.22, which doesn't have the PG15+ `NULLS NOT DISTINCT` modifier, so the fix is a partial unique index restricted to the `track_title IS NULL` slice. The base `cta_unique_idx` keeps enforcing uniqueness on the non-NULL slice via standard Postgres semantics (non-NULL rows are compared normally); the new `cta_unique_null_track_idx` covers the NULL slice on `(library_id, artist_name)`. Together they enforce the full intended semantics on PG 14, and the migration is purely additive — no DROP, no rebuild, no lock on existing rows beyond the small NULL-slice build window. schema.ts declares both indexes so drizzle-kit drift detection covers them. `track_title` stays nullable — "no track title" is a real domain state, not an empty string, and a sentinel would force read paths to translate between presentation NULL and storage ''. The migration carries a precondition guard (issue #705) that counts duplicate `(library_id, artist_name)` groups with NULL `track_title` and fails fast if any remain — a 2026-06-13 prod audit found 0, so the rebuild applies cleanly. Same shape as 0071. A unit test (tests/unit/database/schema.cta-unique-null-track-partial.test.ts) is the cross-source drift guard; an integration spec (tests/integration/cta-unique-null-track-partial.spec.js) verifies that the second NULL-track insert is rejected with SQLSTATE 23505 and that distinct non-NULL titles still coexist. Closes #1135
e4edfe3 to
f84fa59
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1135
Problem
cta_unique_idxonwxyc_schema.compilation_track_artist (library_id, artist_name, track_title)was declared in 0037 without any NULL-equality coercion. Postgres treats NULLs as distinct in unique B-tree comparisons by default, so two rows sharing the same(library_id, artist_name)andNULL track_titlecould coexist — exactly what the constraint name implies it should prevent.track_titlestays nullable; ETL writes with no track title are legitimate domain state.Approach (pivot from
NULLS NOT DISTINCT)The first attempt at this PR used
CREATE UNIQUE INDEX … NULLS NOT DISTINCT. Themigrate-dryrunCI job revealed the issue body's "dev/CI/prod on PG 18" claim referred to the docker-compose pin, not the prod runtime — prod RDS runs PostgreSQL 14.22, whereNULLS NOT DISTINCTis a syntax error (PG15+ feature). The first push failed the dryrun with42601 syntax error at or near "NULLS".Pivot to the task body's suggested PG14-compatible alternative: a complementary partial unique index restricted to the
track_title IS NULLslice.cta_unique_idxcontinues to enforce uniqueness on the non-NULL slice via standard Postgres semantics (non-NULL rows compare normally).cta_unique_null_track_idxis a partial unique on(library_id, artist_name) WHERE track_title IS NULL, closing the NULL bucket.NULLS NOT DISTINCTif we want to consolidate.Migration is purely additive — no DROP, no rebuild of the base index, no lock on existing rows beyond the small NULL-slice build window. The precondition guard (issue #705, same shape as 0071) counts duplicate
(library_id, artist_name)groups with NULLtrack_titleand fails fast if any remain. A 2026-06-13 prod audit returned 0 such groups, so the rebuild applies cleanly; CI's dryrun against the prod snapshot confirms.Schema parity
schema.tsdeclares both indexes so drizzle-kit drift detection covers them:The 0094 snapshot mirrors the schema with the qualified WHERE convention used since 0083.
Tests
tests/unit/database/schema.cta-unique-null-track-partial.test.ts— schema-source drift guard. Asserts the migration creates the partial unique with the right predicate on(library_id, artist_name), carries the precondition guard, doesn't touch the basecta_unique_idx, doesn't useCONCURRENTLYorNULLS NOT DISTINCT, and matches the schema.ts declaration. Mirrorsschema.flowsheet-album-id-enriched-idx.test.ts.tests/integration/cta-unique-null-track-partial.spec.js— runtime behavior. Two NULL-track inserts → second is rejected with SQLSTATE 23505 +cta_unique_null_track_idxconstraint name. A NULL-track row coexists with a non-NULL-track row for the same(library_id, artist_name)(the partial only collapses NULLs).Production ops
cta_unique_idxstays in place exactly as is.CREATE UNIQUE INDEXis NOTCONCURRENTLYbecause drizzle wraps each migration in a transaction. The NULL-track slice is small (sub-second build); if it ever grows materially the runbook is to build the partial out-of-band withCONCURRENTLYfirst, then addIF NOT EXISTSto a follow-up migration.CI
All jobs green on the latest push:
detect-changes,lint-and-typecheck,unit-tests,Migration Dry-Run (prod-shaped data),Integration-Tests