Skip to content

diag(cache): four LML#537 probes for Discogs cache hit ratio plateau#540

Merged
jakebromberg merged 4 commits into
mainfrom
worktree-lml-537-cache-investigation
Jun 12, 2026
Merged

diag(cache): four LML#537 probes for Discogs cache hit ratio plateau#540
jakebromberg merged 4 commits into
mainfrom
worktree-lml-537-cache-investigation

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Implements four of the six concrete next steps from #537 — three diagnostic scripts and one Sentry-side telemetry improvement. The remaining two (#4 steady-state estimate, #5 PG row-count audit) are non-code follow-ups that will live as comments on the issue once a few days of data accumulate.

Probes

#6 — Rate-limiter telemetry tag (discogs/service.py, discogs/fallthrough.py)

Tags lml.discogs.semaphore and lml.discogs.rate_limiter Sentry spans with lml.discogs.method and lml.discogs.cache_state so the wait-time histogram can be split by which cache method's API miss triggered the wait. cache_state is one of miss (PG checked and missed), skip (per-request bypass via should_skip_cache()), no_pg (read-only method), or cooldown (#324 short-circuit active). Threaded via a ContextVar set in the fallthrough seam and read in _request_with_retry — same pattern as the existing _skip_cache_var in discogs/memory_cache.py, so no kwargs threading through 9 call sites. Token-paired set/reset survives CancelledError.

#1 — Cache miss provenance sample (scripts/cache_miss_provenance.py)

For a sample of cache-miss → API-call events, classifies each by whether the release row already existed at miss time and whether it carried release_track rows. Separates first-time misses (H1 — ETL coverage gap) from release-row-exists-with-tracks misses (H3' — validate_track_on_release returning False as a cache hit). Accepts either a Railway log file (parsed by built-in regex patterns against the existing INFO-level Validated: / Failed to fetch release lines) or a CSV of release_id,method rows from a Sentry trace export.

Example run:

DATABASE_URL_DISCOGS=postgresql://... python scripts/cache_miss_provenance.py --log railway-export.log --limit 50

Outputs /tmp/lml-537-cache-miss-provenance.csv + a summary table.

#2 — Dynamic-write contribution histogram (scripts/cache_warm_histogram.py)

Buckets release.artwork_checked_at by day and contrasts the pre-Alembic-0009 ETL fill with the post-deploy dynamic warm-up daily mean. A small post-deploy mean confirms H1.

DATABASE_URL_DISCOGS=postgresql://... python scripts/cache_warm_histogram.py
DATABASE_URL_DISCOGS=postgresql://... python scripts/cache_warm_histogram.py --csv > histogram.csv

#3 — Discogs ranking jitter (scripts/discogs_ranking_jitter.py)

Issues two identical /database/search?track=...&artist=... calls separated by a delay and measures jaccard overlap of returned release IDs + average rank-delta for shared IDs. Low jaccard confirms H2 — Discogs returns different IDs across calls, so warming the prior set doesn't help the next one. Bypasses DiscogsService (no L1 LRU, no semaphore) to surface Discogs-side ranking behavior directly.

DISCOGS_TOKEN=... python scripts/discogs_ranking_jitter.py \
  --track "Moments of Soft Persuasion" --artist Yoshimura --repeat 3

Out of scope

Test plan

Refs #537.

Implements probes #1, #2, #3, #6 from LML#537.

#6 (code): tag lml.discogs.semaphore and lml.discogs.rate_limiter Sentry spans with discogs.method and cache_state (miss|skip|cooldown|no_pg). Threaded via a contextvar set in the fallthrough seam and read in _request_with_retry, so the wait-time histogram can be split by which method's cache-miss triggered the wait.

#1 (script): scripts/cache_miss_provenance.py classifies sampled cache-miss events by (release row existed, release_track count, method, prior seen in window). Separates H1 (ETL coverage gap) from H3' (validate returning False as a cache hit).

#2 (script): scripts/cache_warm_histogram.py buckets release.artwork_checked_at by day and contrasts pre/post Alembic-0009 daily means.

#3 (script): scripts/discogs_ranking_jitter.py issues two identical /database/search calls separated by a delay and measures jaccard overlap of returned release IDs to test H2 (ranking jitter defeats warm-up).

All scripts are read-only. Probes #4 and #5 are non-code follow-ups deferred to comments on the tracking issue.
…scripts

Three production methods (search_releases_by_album_title, get_label_image, get_master) call _request_with_retry directly without going through fallthrough, leaving their wait spans untagged in the LML#537 telemetry. Adds a request_context context manager in discogs/fallthrough.py and uses it from those three call sites so cache_state=no_pg / method tags propagate. Updates the comment in service.py that wrongly named the health probe as a direct caller (the health probe uses a different code path entirely).

Tightens the fallthrough seam's contextvar set/reset by moving the set() inside the try block so the reset is guaranteed paired even if a future edit inserts a line between them.

scripts/discogs_ranking_jitter.py: drops the duplicate _DISCOGS_API_BASE in favor of the public DISCOGS_API_BASE exported by discogs.service. Replaces the JitterPair set_a/set_b/overlap/union/jaccard @Property chain (6x set rebuild per jaccard access) with a single stats() helper that computes overlap/union/jaccard in one pass. Adds to_record() so --json includes derived metrics (overlap, union, jaccard, position_stability) that p.__dict__ silently dropped.

scripts/cache_miss_provenance.py: drops the "Failed to fetch release N" regex; that warning fires on API faults rather than cache misses and conflates H1/H3' provenance with availability faults.

tests/unit/test_discogs_request_telemetry.py: adopts the canonical captured_spans + _start_span pattern from test_discogs_service.py instead of the bespoke _SpanRecorder class; uses the real DiscogsService fixture with self._client injection instead of DiscogsService.__new__. Adds coverage for the new request_context helper.
…r plumbing

The four seam-using methods (search_releases_by_track, get_release, get_artist_details, search) all carry an `if cache is None: value = await _api_fetch()` fallback that bypassed both fallthrough() and the request_context helper — silently leaving their wait spans untagged whenever DiscogsService runs without a cache_service (the discogs/lookup.py:27 and orchestrator service=None paths). Wraps each of those four branches in `with request_context(label):` so the histogram split holds across cacheless configurations. Adds a regression test exercising the get_release cacheless path.

Adds `get_request_context()` as the public reader so discogs/service.py no longer imports the private `_request_context_var` symbol across module boundaries — mirrors the `should_skip_cache()` shape already established in discogs/memory_cache.py. Defaults `request_context(method, cache_state='no_pg')` since all four direct-caller sites pass the same value, and centralizes the per-span tag application in a new `_apply_request_ctx_tags(span, ctx)` helper so the three-line branch isn't repeated at both the semaphore and per-retry rate-limiter spans.

Refactors fallthrough()'s inline try/finally to use the request_context context manager directly, eliminating the Token-unbound-on-set-failure risk: if ContextVar.set() raises, the generator aborts cleanly before the try block opens and no reset is attempted.

scripts/discogs_ranking_jitter.py: aggregate() returns `None` (not `float('nan')`) for position_stability_mean when no pair has shared IDs. Python's json.dumps default emits literal `NaN` which is invalid per RFC 8259 and rejected by jq / strict parsers; `None` serializes as `null` cleanly. print_summary's NaN check becomes `is None`. to_record()'s docstring now enumerates the keys it surfaces for downstream consumers.
… ctx tagging into fallthrough

scripts/cache_miss_provenance.py: sampling via rng.sample() returns elements in random selection order rather than log order, so the prior_seen_in_window flag was silently inverted whenever --limit clipped the input. Splits classify() into compute_first_seen() (called over the full events BEFORE sampling) plus classify() taking (original_index, event) pairs so the flag stays anchored to the user's log window. Adds a regression test exercising the sampling reorder.

discogs/fallthrough.py: hoists apply_request_ctx_tags() out of service.py and into fallthrough.py next to request_context / get_request_context / _request_context_var so the dict-key contract ("method" / "cache_state") and the Sentry attribute names ("lml.discogs.method" / "lml.discogs.cache_state") live next to the contextvar producer. Future renames touch one file. service.py loses get_request_context's import (the helper reads the contextvar itself now) and a small inline if-and-set block.

Tightens the request_context docstring to enumerate the four `cache is None` fallback branches that wrap it (search_releases_by_track, get_release, get_artist_details, search) and explain why validate_track_on_release intentionally doesn't — it delegates to get_release, whose own request_context wins for the actual HTTP wait span.

Drops a duplicate `from unittest.mock import` inside test_cacheless_get_release_tags_with_no_pg (the symbols were already imported at module scope).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant