You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
End-of-day triage of all 26 open issues in WXYC/discogs-etl. Intended as the entry point for any contributor or agent picking up after 2026-06-10. Supersedes #238 (2026-05-29). Updated 2026-06-10 — original 2026-06-03 snapshot has been refreshed in place; what changed since is summarised in the section below.
LML#356 (ANV-from-dump) closed 2026-06-07 as a dup of LML#497 — the underlying gap is fixed by the same converter+loader work the jumpstart rebuild exercised. The cross-repo runtime risk called out in the 2026-06-03 cut is materially resolved.
New cross-repo dependency: LML#510 (Tombstone 404s for get_artist_details + get_release via not_found column discriminator) — requires an alembic migration in this repo to add the column. Scheduled after BS#1361's BS-side mitigation per LML's tracker (#515).
Close-out [Epic I] Artwork-NULL ambiguity #247 (Epic I — Artwork-NULL ambiguity). Every Scope row is ✅ as of 2026-06-07. Either back-publish the post-drain artwork_null_share measurement and close, or close now and file a thin follow-up specifically for the published-baseline record. Cheapest piece of outstanding work in the queue and removes the highest-noise stale tracker.
Cross-cluster conflicts (decide once before starting)
DuckDB rewrite vs. verify_cache hardening (carried)
#186's epic rewrites the entire scripts/verify_cache.py (75 KB / 1854 LOC currently). The recent column-loss fixes (#232/#234, closed) were targeted patches on that same script. If the rewrite lands, those patches don't migrate automatically. Decision still needed before #190 starts: does the parity test in #191 assert the same column-preservation invariants those patches were defending? If yes, fold into parity. If no, file a follow-up to re-derive them in the DuckDB shape.
The LML#356 ANV ask that was previously called out as natural-fit-inside-#229 closed 2026-06-07 as dup of LML#497. The underlying gap was already shipped by the jumpstart rebuild's converter+loader work. #229's residual scope is labels+masters only; the ANV thread is closed end-to-end.
release_artist.role decision (#210) — RESOLVED at decision-input level (carried)
Cross-repo grep + converter inspection done 2026-06-03; no consumer references the column. Mechanical work remaining: ship Option A. Not really a conflict anymore.
[Epic I] Artwork-NULL ambiguity #247 (Epic I — Artwork-NULL ambiguity). Substance fully shipped 2026-05-30 + 2026-06-07 LML drain. Status comment posted 2026-06-10. Recommend close once acceptance measurement back-published (or split that into its own thin issue).
Stub-body issues — bodies failed to interpolate at filing (carried, 21 days uninterpolated)
[Epic] MD-set "Not on Discogs" flag for catalog releases Backend-Service#1280 epic + #1281/#1282/#1283/#1294 — "MD-set 'Not on Discogs' flag for catalog releases". Daily cron and lml-client extensions interact with discogs-cache reads; align with the cache-side 404 tombstone story (LML#510) so the gate semantics don't fork.
PR test: bump charset-torture corpus pin to v0.12.0 #150 — bump charset-torture corpus pin to v0.12.0. Author: jakebromberg; created 2026-05-05. CI all green (lint, drift-check, test, pg, marker-sync). mergeStateStatus: DIRTY — needs rebase against current main. Gating dep (wxyc-shared v0.12.0) is long-resolved. Rebase + merge.
Active cross-cutting projects this repo participates in
Music Data Pipeline Hardening (org Project Pipeline performance optimizations #19) — phases A–D done; E (monorepo) + F (split semantic-index) outstanding org-wide; discogs-etl issues touched by this stay on this tracker.
All blocked / sequenced issues carry a "## Blocked by" body section and a native blockedBy relationship (rendered under the "Relationships" pill in the GitHub UI).
Per-issue triage commentary is on each issue's page (look for "Triage finding" / "Status update" / "Sequencing concern" / "Recommend closing" / "Stub-body finding" comments dated 2026-06-03 and 2026-06-10).
Cross-repo rollup: items from this tracker that block other repos or carry external deadlines are mirrored in Cross-repo critical path (org Project #33).
End-of-day triage of all 26 open issues in
WXYC/discogs-etl. Intended as the entry point for any contributor or agent picking up after 2026-06-10. Supersedes #238 (2026-05-29). Updated 2026-06-10 — original 2026-06-03 snapshot has been refreshed in place; what changed since is summarised in the section below.What's changed since 2026-06-03
artist.profile,artist_alias,artist_name_variation,artist_member,artist_urlfor the first time. This is the most substantive shift in the repo's coverage shape since Production cron pair-wise filter never executed against Railway prod #188 (the 2026-05-14 rebuild). Consequences land throughout this tracker.rm -rf— silently exits 0 on pipeline failure #269 (ERR-trap exit-code clobber), rebuild-cache.sh: redirect TMPDIR to $WORK_DIR — converter CSVs land in tmpfs (~2 GB RAM cap) not EBS #271 (TMPDIR → $WORK_DIR), cache_metadata missing UNIQUE(release_id) — populate_cache_metadata ON CONFLICT fails on load #273 (cache_metadata UNIQUE) — all closed 2026-06-07 as part of getting the jumpstart over the line. ephemeral-rebuild: 100 GB EBS exhausted at 94% of release scan — bump volume or split /tmp #268 (EBS exhaustion at 94%) and Add PG integration test for populate_cache_metadata's ON CONFLICT race-tolerance #207 (cache_metadata race-tolerance integration test) also closed.artists.xml.gz, so the epic's residual scope is labels + masters only, not all three. Per-issue status comments posted on rebuild-cache.sh: fetch + process all four Discogs dump files (artists, labels, masters in addition to releases) #228 and [Epic] Expand monthly cache rebuild to ingest the full Discogs dump (releases + artists + labels + masters) #229 noting the shrink. Stub bodies still uninterpolated 21 days post-filing.enhancement.get_artist_details+get_releasevianot_foundcolumn discriminator) — requires an alembic migration in this repo to add the column. Scheduled after BS#1361's BS-side mitigation per LML's tracker (#515).DIRTY(rebase needed). No activity since 2026-06-03.P0 sequencing — recommended order
artwork_null_sharemeasurement and close, or close now and file a thin follow-up specifically for the published-baseline record. Cheapest piece of outstanding work in the queue and removes the highest-noise stale tracker.rebuild-cache-bootstrap.shto lean on the prebuilt-download path. Low-priority but mechanically straightforward. ~2-5 min off every cold start, no correctness impact.release_artist.role). One alembic migration plus mirror inschema/create_database.sql:77-83. Decision input was complete in [Tracker] 2026-06-10 issue triage — 26 open #263's 2026-06-03 cut; remaining work is mechanical.WXYC/wiki/plans/duckdb-version-pin.md. First-mover prerequisite for the entire Epic: rewrite verify_cache.py against DuckDB ATTACH (Project 2 candidate 1) #186 epic chain (Rewrite scripts/verify_cache.py against DuckDB ATTACH #190 → Parity test: rewritten verify_cache.py vs original on fixture corpus #191 → Update README.md and CLAUDE.md for the new DuckDB-based pattern #192 → Production smoke-test against full prod-shaped corpus #193). Cheap, unblocks four downstream issues.Critical / high severity
Cross-cluster conflicts (decide once before starting)
DuckDB rewrite vs. verify_cache hardening (carried)
#186's epic rewrites the entire
scripts/verify_cache.py(75 KB / 1854 LOC currently). The recent column-loss fixes (#232/#234, closed) were targeted patches on that same script. If the rewrite lands, those patches don't migrate automatically. Decision still needed before #190 starts: does the parity test in #191 assert the same column-preservation invariants those patches were defending? If yes, fold into parity. If no, file a follow-up to re-derive them in the DuckDB shape.Full-dump expansion (#229) vs. ANV ask — RESOLVED
The LML#356 ANV ask that was previously called out as natural-fit-inside-#229 closed 2026-06-07 as dup of LML#497. The underlying gap was already shipped by the jumpstart rebuild's converter+loader work. #229's residual scope is labels+masters only; the ANV thread is closed end-to-end.
release_artist.role decision (#210) — RESOLVED at decision-input level (carried)
Cross-repo grep + converter inspection done 2026-06-03; no consumer references the column. Mechanical work remaining: ship Option A. Not really a conflict anymore.
Obsolete / needs revision (recommend close)
scripts/run_pipeline.py:187-335implements--catalog-source/--catalog-db-url/--wxyc-db-url. Phases 1-6 all in tree. Already labeledobsolete.scripts/resolve_collisions.py(20 KB) is on main. Already labeledobsolete.scripts/sync-library.sh:152-162runsexport_streaming_links.pyon every build; already labeledobsolete.obsolete.Stub-body issues — bodies failed to interpolate at filing (carried, 21 days uninterpolated)
@/tmp/<file>.md(literal). Per-issue status comments dated 2026-06-10 spell out what the 2026-06-07 work shifted in each one. Until repaired, do not treat any of the four as actionable in their current shape.Cross-repo dependencies (encoded both natively and in body prose)
Cross-repo runtime risks
get_artist_details+get_releasevianot_foundcolumn discriminator) — REQUIRES a discogs-etl alembic migration for thenot_foundcolumn. Per LML's 2026-06-07 tracker (#515), schedules after BS#1361's BS-side mitigation lands. Net-new cross-repo dep since [Tracker] 2026-06-10 issue triage — 26 open #263's last cut.sub_tracksparent-child viarelease_sub_tracktable) — open. A non-trivial fraction ofrelease_trackrows in the current cache may be wrong shape because of nested<track>inside<sub_tracks>. Affects Audit: rebuild release count below expected pair-wise band (38,662 vs 50–80K) #217's release-count audit (candidate root cause), and would surface in the DuckDB rewrite's parity test (Parity test: rewritten verify_cache.py vs original on fixture corpus #191) — parity may need to be measured against a corpus where this is fixed, otherwise both implementations reproduce the same wrong rows.lml_identity_idfan-out. Reduces direct discogs-cache write pressure from the rotation backfill path.Cross-repo runtime risks — RESOLVED since 2026-06-03
artist_name_variationduring cache rebuild) — closed 2026-06-07 as dup of LML#497; the converter+loader work the jumpstart rebuild exercised covers it.release_track_artisttoextra = 0library-metadata-lookup#333 (extra=0filter on validate_track_on_release) — closed; the column has been in tree and is now consumed.Open PR in flight
mergeStateStatus: DIRTY— needs rebase against current main. Gating dep (wxyc-shared v0.12.0) is long-resolved. Rebase + merge.Cluster map (26 open)
test_base_trigram_indexes_use_unaccentfrom blocklist to allowlist #184.Active cross-cutting projects this repo participates in
entity.identityreconciliation; Audit: rebuild release count below expected pair-wise band (38,662 vs 50–80K) #217's release-count audit feeds the calibration positive class.How to find work
blockedByrelationship (rendered under the "Relationships" pill in the GitHub UI).