You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Re-run Path B of #1368's frequency analysis once the venue-events scraper has been running steadily in production for ≥30 days. Append a "Path B cross-check" section to plans/touring-events/frequency-analysis.md in wxyc-workspace.
Why a separate issue
Per #1368, Path A is the headline deliverable and runs as soon as the scraper (#1343) is deployed. Path B has to wait for two distinct things to settle:
Scraper stability — ≥30 nightly runs of idempotent upserts to confirm the writer behaves as expected against drifting RHP HTML. This is not about forward-window data accumulating: RHP venues publish 60–90 days ahead, so the forward window is populated by day 1. The 30-day clock is about confidence that what we're reading is stable.
Encoding these in a real ticket avoids the "we'll get to it" failure mode and lets the dependency be wired into the issue graph.
Earliest run
Today is 2026-06-08; #1345 just merged. Budget no earlier than 2026-07-10 for the scraper-stability clock, and gate on #1372 having run at least once. If either condition is unmet, do not start.
Procedure
Confirm scraper has been running nightly for ≥30 days (check concerts.scraped_at distribution).
Confirm Backfill concerts.headlining_artist_id via local artist + alias resolver #1372's resolver pass has run and report headlining_artist_id coverage. No hard threshold — touring headliners are typically well-known acts and may resolve well above LML's 24% catalog ceiling, or may underperform if the alias substrate hasn't caught up. Whatever it is, report it; the raw-vs-FK delta below quantifies the impact directly.
-- Loose: artist in library, canonical FKSELECT date_trunc('day', c.starts_at)::dateAS day, COUNT(DISTINCT c.id)
FROMwxyc_schema.concerts c
WHEREc.headlining_artist_idIN (SELECT DISTINCT artist_id FROMwxyc_schema.library)
ANDc.starts_at BETWEEN now() AND now() + interval '90 days'ANDc.status<>'cancelled'GROUP BY1ORDER BY1;
-- Tight: played in trailing 30d, canonical FKSELECT date_trunc('day', c.starts_at)::dateAS day, COUNT(DISTINCT c.id)
FROMwxyc_schema.concerts c
JOINwxyc_schema.flowsheet f ONf.artist_id=c.headlining_artist_idWHEREf.start_time>= now() - interval '30 days'ANDc.starts_at BETWEEN now() AND now() + interval '90 days'ANDc.status<>'cancelled'GROUP BY1ORDER BY1;
Tightest (heavy/medium rotation in trailing 90d) joins wxyc_schema.rotation per #1368's methodology block — same canonical-FK shape, swap the flowsheet join for the rotation join.
Raw-name variants follow #1368's methodology (lower + leading-"The" strip on both sides). Run the same three rules with raw-name joins for the A/B comparison; the delta against canonical-FK is the resolver-coverage signal.
Append a "Path B cross-check" section to the memo:
The six queries' results.
Side-by-side table: Path A (one-shot HTML pull) vs Path B raw-name (30 days of scraper data) vs Path B canonical-FK.
Six Path B queries (three rules × {canonical-FK, raw-name}) run against ≥30 days of concerts data
Results appended to the memo as a "Path B cross-check" section
Side-by-side comparison table: Path A vs Path B raw-name vs Path B canonical-FK, all three match rules
Resolver-coverage delta reported per match rule; this delta is the headline new number whether or not it's large
If any rule shows a Path A vs Path B divergence of >25%, investigate root cause (sample-size, calendar drift between Path A's one-shot HTML and Path B's accumulated data, parser drift) and document the cause in the memo
Memo notes explicitly that Path B's 30-day window is too small to confirm the bimodal seasonal pattern asserted in Path A
Memo TL;DR updated if Path B materially changes the headline rate
Scope
Re-run Path B of #1368's frequency analysis once the venue-events scraper has been running steadily in production for ≥30 days. Append a "Path B cross-check" section to
plans/touring-events/frequency-analysis.mdinwxyc-workspace.Why a separate issue
Per #1368, Path A is the headline deliverable and runs as soon as the scraper (#1343) is deployed. Path B has to wait for two distinct things to settle:
concerts.headlining_artist_idvia local artist + alias resolver #1372 populatingheadlining_artist_idso the SQL JOINs cleanly on the canonical FK rather than via the brittle raw-name fallback.Encoding these in a real ticket avoids the "we'll get to it" failure mode and lets the dependency be wired into the issue graph.
Earliest run
Today is 2026-06-08; #1345 just merged. Budget no earlier than 2026-07-10 for the scraper-stability clock, and gate on #1372 having run at least once. If either condition is unmet, do not start.
Procedure
concerts.scraped_atdistribution).concerts.headlining_artist_idvia local artist + alias resolver #1372's resolver pass has run and reportheadlining_artist_idcoverage. No hard threshold — touring headliners are typically well-known acts and may resolve well above LML's 24% catalog ceiling, or may underperform if the alias substrate hasn't caught up. Whatever it is, report it; the raw-vs-FK delta below quantifies the impact directly.Tightest (heavy/medium rotation in trailing 90d) joins
wxyc_schema.rotationper #1368's methodology block — same canonical-FK shape, swap the flowsheet join for the rotation join.Raw-name variants follow #1368's methodology (lower + leading-"The" strip on both sides). Run the same three rules with raw-name joins for the A/B comparison; the delta against canonical-FK is the resolver-coverage signal.
artistsduplicate-name groups (the known unresolvable case in Backfillconcerts.headlining_artist_idvia local artist + alias resolver #1372).Acceptance
concertsdataRelated
concerts.headlining_artist_idvia local artist + alias resolver #1372 (canonical FK populated)