feat(alembic): 0012 entity.release_identity for LML#526#280
Merged
Conversation
Brings the release-side entity layer under the alembic chain so LML's POST /api/v1/identity/resolve flips from 503 (probe miss) to 200. DDL mirrors entity/release_identity.sql in LML#530, plus CREATE SCHEMA IF NOT EXISTS entity ahead of the table DDL because the canonical file assumes the schema is already present (existing prod has it from out-of-band bootstrap; fresh dev DBs do not). Adds entity.release_identity (PK id, six per-source UNIQUE columns load-bearing for LML's INSERT ... ON CONFLICT mint protocol), entity.release_reconciliation_log (FK to release_identity.id, NO ACTION), and idx_release_reconciliation_log_identity_id (FK referencing index — Postgres does not auto-index FK columns). Migration uses the autocommit side-channel pattern from 0010/0011: is_offline_mode() refuse + DATABASE_URL_DISCOGS resolver + direct psycopg connect. Downgrade drops index, log, identity in FK order and leaves the entity schema in place (artist-side tables still live there; their alembic adoption is tracked at #279). Truncate-exclusion plumbing in scripts/import_csv.py and scripts/run_pipeline.py generalised from "entity.identity" to "the entity schema" so the comment and --truncate-existing help text cover all current and future entity tables without per-table rot. Closes #278
Round 1 of code-review feedback on PR #280. Correctness / coverage: Mirror 0012's entity DDL into schema/create_database.sql per the documented dual-write convention (docs/migrations.md:35). A fresh --fresh-rebuild against an empty DB no longer leaves the entity tables missing — LML's POST /api/v1/identity/resolve probe now resolves against schema produced by either alembic head or create_database.sql. Add test_downgrade_drops_release_side_tables — first test in the suite to actually run alembic downgrade, so a typo in _DOWNGRADE_SQL no longer ships green. Add test_upgrade_against_prod_bootstrap_state — seeds the entity schema + a mock artist-side entity.identity table before upgrade so the test exercises the documented prod scenario (out-of-band bootstrap) rather than only trivial state the migration just created. Add test_release_reconciliation_log_columns — pins every column's type/nullability so source/external_id/method/confidence shape drift is caught instead of corrupting LML's audit trail at runtime. Pin every release_identity column's (data_type, is_nullable) via strict equality on the column set; pin the load-bearing DEFAULT 'unreconciled' on reconciliation_status. Future SERIAL→BIGSERIAL or INTEGER↔TEXT drift now trips the test before reaching LML's mint protocol. Extend the FK-rules test to check both delete_rule AND update_rule — a CASCADE update_rule would silently propagate id renumbering through the audit log. Cleanup / consistency: Extract resolve_db_url / refuse_offline to lib/alembic_helpers.py and use it from 0010, 0011, 0012 — the boilerplate was duplicated three times. Placed under lib/ (not alembic/) because the local alembic/ directory is shadowed by the installed alembic package. Hoist _run_alembic subprocess wrapper into a new run_alembic fixture in tests/integration/conftest.py; 0012's test now uses the fixture instead of carrying a local copy. Drop the redundant DROP INDEX from _DOWNGRADE_SQL (DROP TABLE drops its indexes); update the comment to reflect FK-child→parent ordering. Drop the two static text-grep tests that were subsumed by pg-marked runtime tests; keep test_migration_file_exists as a fast static gate. Guardrails: Module-level assertion in scripts/import_csv.py refuses any schema-qualified or 'entity'-prefixed name in CACHE_TABLES_TO_TRUNCATE_BASE / CACHE_TABLES_TO_TRUNCATE_TRACKS at import time. Converts the "preserves the entire entity schema" prose contract into a runtime guard so a future fat-finger surfaces immediately.
Correctness: Narrow the truncate-list guard in scripts/import_csv.py from `"." in _t or _t.startswith("entity")` to just `"." in _t`. The bare-prefix check would have falsely tripped any future public-schema table whose name happens to begin with "entity" (e.g. `entity_log`, `entity_audit`); cross-schema reach via TRUNCATE requires a schema qualifier, which the dot check covers exactly. Loop variable renamed to `_truncate_table` and `del`-ed after the loop so the guard does not pollute the module namespace. Add `timeout=60` to the run_alembic subprocess wrapper so a hung alembic invocation raises TimeoutExpired rather than wedging CI until the outer job timeout.
Decoupling: Replace three hardcoded "0011_artist_not_found" / "0012_entity_release_identity" strings in the 0012 test with REVISION / PRIOR_REVISION constants parsed from the migration source. Mirrors the precedent set by commit c228b47 (0009 revision id shortened to fit alembic_version varchar(32)) — any future rename of 0011 will now be tracked by the test instead of failing eight pg cases in lockstep.
Helper hardening: Lazy-import alembic.context inside refuse_offline so resolve_db_url is now testable / usable in any non-migration context (unit tests, CLI scripts) without alembic-the-package being importable.
Strictness: Replace the substring check for reconciliation_status DEFAULT with a regex that extracts the literal portion of the column_default expression and asserts exact equality to "unreconciled". The substring form was already robust against `'unreconciled_v2'`-style drift (the closing quote blocks the substring match), but the explicit literal-extract version makes the assertion semantics legible and catches `coalesce`/wrapped-default rewrites cleanly.
…tests
Correctness: Move _read_revision_metadata out of module load so a missing or malformed migration file shows up as a clean test_migration_file_exists failure instead of an opaque collection error that swallows every test in the file. Each pg test now calls the parser directly when it needs the revision strings.
Refactor: Replace the bare module-level for/del guard in scripts/import_csv.py with a _validate_truncate_lists() function called once at import. Eliminates both the empty-list NameError risk (the for never binds the iteration variable, so the del would explode) and the module-namespace pollution that the del was trying to clean up. Self-documenting via the function name and docstring.
Doc drift: 0012's docstring claimed downgrade drops "index, then log, then identity" — stale since round 1 removed the explicit DROP INDEX. Reword to match the inline _DOWNGRADE_SQL comment ("child table, then parent; DROP TABLE on the child also drops its FK index").
Test coverage: Add tests/unit/test_alembic_helpers.py pinning four contracts on the new shared module — env-var precedence (DATABASE_URL_DISCOGS over DATABASE_URL), empty-string fallback, error message includes the revision label, and the lazy-import contract (importing lib.alembic_helpers must not pull in alembic). Catches helper regressions without needing a live PG.
Dual-write drift detection: Add test_create_database_sql_matches_migration_shape — applies schema/create_database.sql to a fresh DB and asserts the same RELEASE_IDENTITY_COLUMN_SHAPE / RECONCILIATION_LOG_COLUMN_SHAPE pins the migration tests use. Closes the other direction of dual-write drift: prior tests caught migration drift; this one catches create_database.sql drift.
jakebromberg
added a commit
that referenced
this pull request
Jun 10, 2026
Round 1 of code-review feedback on PR #280. Correctness / coverage: Mirror 0012's entity DDL into schema/create_database.sql per the documented dual-write convention (docs/migrations.md:35). A fresh --fresh-rebuild against an empty DB no longer leaves the entity tables missing — LML's POST /api/v1/identity/resolve probe now resolves against schema produced by either alembic head or create_database.sql. Add test_downgrade_drops_release_side_tables — first test in the suite to actually run alembic downgrade, so a typo in _DOWNGRADE_SQL no longer ships green. Add test_upgrade_against_prod_bootstrap_state — seeds the entity schema + a mock artist-side entity.identity table before upgrade so the test exercises the documented prod scenario (out-of-band bootstrap) rather than only trivial state the migration just created. Add test_release_reconciliation_log_columns — pins every column's type/nullability so source/external_id/method/confidence shape drift is caught instead of corrupting LML's audit trail at runtime. Pin every release_identity column's (data_type, is_nullable) via strict equality on the column set; pin the load-bearing DEFAULT 'unreconciled' on reconciliation_status. Future SERIAL→BIGSERIAL or INTEGER↔TEXT drift now trips the test before reaching LML's mint protocol. Extend the FK-rules test to check both delete_rule AND update_rule — a CASCADE update_rule would silently propagate id renumbering through the audit log. Cleanup / consistency: Extract resolve_db_url / refuse_offline to lib/alembic_helpers.py and use it from 0010, 0011, 0012 — the boilerplate was duplicated three times. Placed under lib/ (not alembic/) because the local alembic/ directory is shadowed by the installed alembic package. Hoist _run_alembic subprocess wrapper into a new run_alembic fixture in tests/integration/conftest.py; 0012's test now uses the fixture instead of carrying a local copy. Drop the redundant DROP INDEX from _DOWNGRADE_SQL (DROP TABLE drops its indexes); update the comment to reflect FK-child→parent ordering. Drop the two static text-grep tests that were subsumed by pg-marked runtime tests; keep test_migration_file_exists as a fast static gate. Guardrails: Module-level assertion in scripts/import_csv.py refuses any schema-qualified or 'entity'-prefixed name in CACHE_TABLES_TO_TRUNCATE_BASE / CACHE_TABLES_TO_TRUNCATE_TRACKS at import time. Converts the "preserves the entire entity schema" prose contract into a runtime guard so a future fat-finger surfaces immediately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
entity.release_identity+entity.release_reconciliation_log+ FK index on the discogs-cache PG instance. After this lands, LML'sPOST /api/v1/identity/resolveflips from 503 to 200.CREATE SCHEMA IF NOT EXISTS entityso fresh dev DBs work — the LML canonicalentity/release_identity.sqlassumes the schema is already present (existing prod has it from out-of-band bootstrap).--truncate-existingcomments and help text inscripts/import_csv.py/scripts/run_pipeline.pyfrom "entity.identity" to "the entity schema" so the messaging covers all four entity tables and the schema-level exclusion convention.Design notes
is_offline_mode()refuse +DATABASE_URL_DISCOGS→DATABASE_URLresolver + directpsycopg.connect(..., autocommit=True)). DDL isIF NOT EXISTS-only so the migration is safe to re-apply against prod, which already has the schema from the artist-side bootstrap.INSERT ... ON CONFLICT ({col}) DO NOTHING RETURNING id). The integration test enumerates them againstpg_constraintso any drift surfaces in CI.WHERE identity_id = $1lookup needs it.entityschema in place. Adopting the artist-sideentity.identity/entity.reconciliation_loginto the alembic chain is tracked at Adopt entity.identity / entity.reconciliation_log into alembic chain #279.discogs_release_id/discogs_master_idareINTEGERbut not FKs to the cache'srelease.id/master.id— the identity layer's lifecycle is independent of the monthly cache rebuild, matching the canonical LML shape.Test plan
ruff check .cleanruff format --check .cleantests/integration/test_alembic_0012_entity_release_identity.pypass against the Docker PG (3 static + 7 pg-marked).POST /api/v1/identity/resolveagainst prod LML returns 200 with{"identity_id": N, "minted": true}on a fresh(discogs_release, 12345)and{"identity_id": N, "minted": false}on the repeat. Operator step — runs after the nextrebuild-cache.ymlinvocation appliesalembic upgrade headagainst discogs-cache PG.Closes #278