Skip to content

perf(search): parallelize SONG_AS_TRACK validate loop#535

Merged
jakebromberg merged 1 commit into
mainfrom
perf/parallelize-song-as-track-validate
Jun 11, 2026
Merged

perf(search): parallelize SONG_AS_TRACK validate loop#535
jakebromberg merged 1 commit into
mainfrom
perf/parallelize-song-as-track-validate

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Closes #534.

Summary

  • search_song_as_track awaited discogs_service.validate_track_on_release once per Discogs candidate serially — with ~7 candidates per lookup at ~1.2s each (worse when the PG cache write path is degraded, Sentry LIBRARY-METADATA-LOOKUP-9), an 8-10s serial chain on /api/v1/lookup's hot path.
  • Mirror the search_compilations_for_track pattern: extract per-release library-match + validate into _validate_one(release), fan out with asyncio.gather, walk results in input-index order for the dedup pass so relevance ranking and matched_via_by_id hint order are preserved.
  • Orchestrator-local Semaphore(5) caps per-request fan-out at the same size as the global Discogs semaphore (discogs/service.py:373); the global rate limiter still sits underneath.
  • Bench: 5 candidates × 200ms validate → serial 1.007s, parallel 0.20s (5x).

Test plan

  • Concurrency: 5 fake validate calls × 200ms → wall time near 200ms not 1000ms
  • Input order preservation: releases [A, B, C] where B drops on validation → output [A, C] even when B finishes first
  • Hint accumulation order: same WXYC row referenced by [X (slow), Y (fast)] → matched_via_by_id[id] is [X_hint, Y_hint]
  • Semaphore bound: 20 candidates → peak in-flight ≤ 5
  • All 10 pre-existing TestSearchSongAsTrack tests still pass
  • Full default unit suite (2630 tests) green
  • ruff check . / ruff format --check . / mypy . --ignore-missing-imports all clean

`search_song_as_track` awaited `validate_track_on_release` per Discogs candidate serially. With ~7 candidates per lookup at ~1.2s each that's an 8-10s serial chain on `/api/v1/lookup`'s p95/p99 hot path — the dominant contributor to the early-May regression.

Mirror the shape `search_compilations_for_track` already uses (extract per-release work into a helper, gather concurrently, walk results in input-index order to preserve relevance ranking during dedup). An orchestrator-local `Semaphore(5)` caps per-request fan-out at the same size as the global Discogs semaphore in `discogs/service.py:373`; the global rate limiter still sits underneath, so cross-request load is unaffected.

Bench (5 candidates x 200ms validate): serial 1.007s -> parallel 0.20s (5x). Same inputs in -> same `matched_items` / `matched_via_by_id` out; order and hint accumulation pinned by four new tests in `test_orchestrator_helpers.py::TestSearchSongAsTrack`.
@jakebromberg jakebromberg merged commit 055583a into main Jun 11, 2026
11 checks passed
@jakebromberg jakebromberg deleted the perf/parallelize-song-as-track-validate branch June 11, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SONG_AS_TRACK: parallelize per-release validate_track_on_release loop

1 participant