diag(cache): four LML#537 probes for Discogs cache hit ratio plateau by jakebromberg · Pull Request #540 · WXYC/library-metadata-lookup

jakebromberg · 2026-06-12T15:31:28Z

Implements four of the six concrete next steps from #537 — three diagnostic scripts and one Sentry-side telemetry improvement. The remaining two (#4 steady-state estimate, #5 PG row-count audit) are non-code follow-ups that will live as comments on the issue once a few days of data accumulate.

Probes

#6 — Rate-limiter telemetry tag (`discogs/service.py`, `discogs/fallthrough.py`)

Tags lml.discogs.semaphore and lml.discogs.rate_limiter Sentry spans with lml.discogs.method and lml.discogs.cache_state so the wait-time histogram can be split by which cache method's API miss triggered the wait. cache_state is one of miss (PG checked and missed), skip (per-request bypass via should_skip_cache()), no_pg (read-only method), or cooldown (#324 short-circuit active). Threaded via a ContextVar set in the fallthrough seam and read in _request_with_retry — same pattern as the existing _skip_cache_var in discogs/memory_cache.py, so no kwargs threading through 9 call sites. Token-paired set/reset survives CancelledError.

#1 — Cache miss provenance sample (`scripts/cache_miss_provenance.py`)

For a sample of cache-miss → API-call events, classifies each by whether the release row already existed at miss time and whether it carried release_track rows. Separates first-time misses (H1 — ETL coverage gap) from release-row-exists-with-tracks misses (H3' — validate_track_on_release returning False as a cache hit). Accepts either a Railway log file (parsed by built-in regex patterns against the existing INFO-level Validated: / Failed to fetch release lines) or a CSV of release_id,method rows from a Sentry trace export.

Example run:

DATABASE_URL_DISCOGS=postgresql://... python scripts/cache_miss_provenance.py --log railway-export.log --limit 50

Outputs /tmp/lml-537-cache-miss-provenance.csv + a summary table.

#2 — Dynamic-write contribution histogram (`scripts/cache_warm_histogram.py`)

Buckets release.artwork_checked_at by day and contrasts the pre-Alembic-0009 ETL fill with the post-deploy dynamic warm-up daily mean. A small post-deploy mean confirms H1.

DATABASE_URL_DISCOGS=postgresql://... python scripts/cache_warm_histogram.py
DATABASE_URL_DISCOGS=postgresql://... python scripts/cache_warm_histogram.py --csv > histogram.csv

#3 — Discogs ranking jitter (`scripts/discogs_ranking_jitter.py`)

Issues two identical /database/search?track=...&artist=... calls separated by a delay and measures jaccard overlap of returned release IDs + average rank-delta for shared IDs. Low jaccard confirms H2 — Discogs returns different IDs across calls, so warming the prior set doesn't help the next one. Bypasses DiscogsService (no L1 LRU, no semaphore) to surface Discogs-side ranking behavior directly.

DISCOGS_TOKEN=... python scripts/discogs_ranking_jitter.py \
  --track "Moments of Soft Persuasion" --artist Yoshimura --repeat 3

Out of scope

Add technical overview document #4 steady-state estimate — back-of-envelope math; will be a comment on investigate: Discogs cache hit ratio plateaued at ~50% (search_releases_by_track at 34%) after write_release fix #537 after data lands.
Add technical overview document #5 PG row-count audit — daily snapshot to capture over a week; the histogram in Artwork fallback: use Discogs artist/label images when album cover is missing #2 already gives the cumulative shape.

Test plan

Unit tests for Add CI workflow for linting and tests on PRs #6 cover all four cache_state values (parametrized), the CancelledError reset, the normal-return reset, and the no-contextvar-set path that keeps direct callers (health probe) working
Unit tests for the three scripts cover the pure math (jaccard, position stability, summarise) and the input parsers (log regex, CSV)
ruff check + ruff format --check clean
mypy --ignore-missing-imports clean on changed files
Full pytest tests/unit/ pass (2446 tests; the 1 pre-existing asyncpg event-loop teardown race in test_dependencies.py is unrelated — passes in isolation)
Run Artwork fallback: use Discogs artist/label images when album cover is missing #2 against staging cache PG, confirm output shape; eyeball the post-cutoff daily mean
Run Add comprehensive test suite with 97% coverage #1 against a 1-hour Railway log window
Run feat: artwork fallback to artist/label images #3 with the three slow-lookup queries from the issue body
Confirm the new tags appear in Sentry on staging after deploy

Refs #537.

Implements probes #1, #2, #3, #6 from LML#537. #6 (code): tag lml.discogs.semaphore and lml.discogs.rate_limiter Sentry spans with discogs.method and cache_state (miss|skip|cooldown|no_pg). Threaded via a contextvar set in the fallthrough seam and read in _request_with_retry, so the wait-time histogram can be split by which method's cache-miss triggered the wait. #1 (script): scripts/cache_miss_provenance.py classifies sampled cache-miss events by (release row existed, release_track count, method, prior seen in window). Separates H1 (ETL coverage gap) from H3' (validate returning False as a cache hit). #2 (script): scripts/cache_warm_histogram.py buckets release.artwork_checked_at by day and contrasts pre/post Alembic-0009 daily means. #3 (script): scripts/discogs_ranking_jitter.py issues two identical /database/search calls separated by a delay and measures jaccard overlap of returned release IDs to test H2 (ranking jitter defeats warm-up). All scripts are read-only. Probes #4 and #5 are non-code follow-ups deferred to comments on the tracking issue.

…scripts Three production methods (search_releases_by_album_title, get_label_image, get_master) call _request_with_retry directly without going through fallthrough, leaving their wait spans untagged in the LML#537 telemetry. Adds a request_context context manager in discogs/fallthrough.py and uses it from those three call sites so cache_state=no_pg / method tags propagate. Updates the comment in service.py that wrongly named the health probe as a direct caller (the health probe uses a different code path entirely). Tightens the fallthrough seam's contextvar set/reset by moving the set() inside the try block so the reset is guaranteed paired even if a future edit inserts a line between them. scripts/discogs_ranking_jitter.py: drops the duplicate _DISCOGS_API_BASE in favor of the public DISCOGS_API_BASE exported by discogs.service. Replaces the JitterPair set_a/set_b/overlap/union/jaccard @Property chain (6x set rebuild per jaccard access) with a single stats() helper that computes overlap/union/jaccard in one pass. Adds to_record() so --json includes derived metrics (overlap, union, jaccard, position_stability) that p.__dict__ silently dropped. scripts/cache_miss_provenance.py: drops the "Failed to fetch release N" regex; that warning fires on API faults rather than cache misses and conflates H1/H3' provenance with availability faults. tests/unit/test_discogs_request_telemetry.py: adopts the canonical captured_spans + _start_span pattern from test_discogs_service.py instead of the bespoke _SpanRecorder class; uses the real DiscogsService fixture with self._client injection instead of DiscogsService.__new__. Adds coverage for the new request_context helper.

…r plumbing The four seam-using methods (search_releases_by_track, get_release, get_artist_details, search) all carry an `if cache is None: value = await _api_fetch()` fallback that bypassed both fallthrough() and the request_context helper — silently leaving their wait spans untagged whenever DiscogsService runs without a cache_service (the discogs/lookup.py:27 and orchestrator service=None paths). Wraps each of those four branches in `with request_context(label):` so the histogram split holds across cacheless configurations. Adds a regression test exercising the get_release cacheless path. Adds `get_request_context()` as the public reader so discogs/service.py no longer imports the private `_request_context_var` symbol across module boundaries — mirrors the `should_skip_cache()` shape already established in discogs/memory_cache.py. Defaults `request_context(method, cache_state='no_pg')` since all four direct-caller sites pass the same value, and centralizes the per-span tag application in a new `_apply_request_ctx_tags(span, ctx)` helper so the three-line branch isn't repeated at both the semaphore and per-retry rate-limiter spans. Refactors fallthrough()'s inline try/finally to use the request_context context manager directly, eliminating the Token-unbound-on-set-failure risk: if ContextVar.set() raises, the generator aborts cleanly before the try block opens and no reset is attempted. scripts/discogs_ranking_jitter.py: aggregate() returns `None` (not `float('nan')`) for position_stability_mean when no pair has shared IDs. Python's json.dumps default emits literal `NaN` which is invalid per RFC 8259 and rejected by jq / strict parsers; `None` serializes as `null` cleanly. print_summary's NaN check becomes `is None`. to_record()'s docstring now enumerates the keys it surfaces for downstream consumers.

… ctx tagging into fallthrough scripts/cache_miss_provenance.py: sampling via rng.sample() returns elements in random selection order rather than log order, so the prior_seen_in_window flag was silently inverted whenever --limit clipped the input. Splits classify() into compute_first_seen() (called over the full events BEFORE sampling) plus classify() taking (original_index, event) pairs so the flag stays anchored to the user's log window. Adds a regression test exercising the sampling reorder. discogs/fallthrough.py: hoists apply_request_ctx_tags() out of service.py and into fallthrough.py next to request_context / get_request_context / _request_context_var so the dict-key contract ("method" / "cache_state") and the Sentry attribute names ("lml.discogs.method" / "lml.discogs.cache_state") live next to the contextvar producer. Future renames touch one file. service.py loses get_request_context's import (the helper reads the contextvar itself now) and a small inline if-and-set block. Tightens the request_context docstring to enumerate the four `cache is None` fallback branches that wrap it (search_releases_by_track, get_release, get_artist_details, search) and explain why validate_track_on_release intentionally doesn't — it delegates to get_release, whose own request_context wins for the actual HTTP wait span. Drops a duplicate `from unittest.mock import` inside test_cacheless_get_release_tags_with_no_pg (the symbols were already imported at module scope).

jakebromberg mentioned this pull request Jun 12, 2026

investigate: Discogs cache hit ratio plateaued at ~50% (search_releases_by_track at 34%) after write_release fix #537

Open

jakebromberg added 3 commits June 12, 2026 09:35

jakebromberg merged commit 8c5d36f into main Jun 12, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diag(cache): four LML#537 probes for Discogs cache hit ratio plateau#540

diag(cache): four LML#537 probes for Discogs cache hit ratio plateau#540
jakebromberg merged 4 commits into
mainfrom
worktree-lml-537-cache-investigation

jakebromberg commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jakebromberg commented Jun 12, 2026

Probes

#6 — Rate-limiter telemetry tag (discogs/service.py, discogs/fallthrough.py)

#1 — Cache miss provenance sample (scripts/cache_miss_provenance.py)

#2 — Dynamic-write contribution histogram (scripts/cache_warm_histogram.py)

#3 — Discogs ranking jitter (scripts/discogs_ranking_jitter.py)

Out of scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

#6 — Rate-limiter telemetry tag (`discogs/service.py`, `discogs/fallthrough.py`)

#1 — Cache miss provenance sample (`scripts/cache_miss_provenance.py`)

#2 — Dynamic-write contribution histogram (`scripts/cache_warm_histogram.py`)

#3 — Discogs ranking jitter (`scripts/discogs_ranking_jitter.py`)