Skip to content

perf(orchestrator): restore MAX_SEARCH_RESULTS early-exit lost in gather conversions#538

Merged
jakebromberg merged 2 commits into
mainfrom
feat/restore-max-results-early-exit
Jun 12, 2026
Merged

perf(orchestrator): restore MAX_SEARCH_RESULTS early-exit lost in gather conversions#538
jakebromberg merged 2 commits into
mainfrom
feat/restore-max-results-early-exit

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Summary

  • Restores the pre-PR (SONG_AS_TRACK: parallelize per-release validate_track_on_release loop #534) early-exit invariant in search_song_as_track and search_compilations_for_track: once MAX_SEARCH_RESULTS matches accumulate, no further validate_track_on_release calls fire.
  • Introduces _chunked_gather, an async-generator that dispatches workers in chunks and yields (item, result) in input order. The next chunk is not dispatched until the caller finishes iterating the previous one — breaking out of the loop after the response cap is hit aborts the generator before un-fired chunks dispatch.
  • Applies the helper at all three gather sites (SONG_AS_TRACK, compilations main pass, compilations album-title fallback) so the two parallel strategies stay in step per the LML#534 single-purpose mandate.
  • The orchestrator-local semaphore in search_song_as_track becomes redundant (chunk size 5 already bounds in-flight to 5) and is removed. _SONG_AS_TRACK_VALIDATE_CONCURRENCY is repurposed as the shared chunk-size constant (value unchanged); docstring updated.

Background

PR #534 parallelized the SONG_AS_TRACK validate loop and modelled it on the existing search_compilations_for_track pattern. Both functions now schedule every candidate via asyncio.gather and only truncate post-hoc — so a high-fanout request with ~15 candidates paid for ~10 wasted /releases/<id> round-trips after the response cap had already saturated. The waste is compounded by Sentry LIBRARY-METADATA-LOOKUP-9 (degraded PG cache write path; p95 5.87s on the validate span).

Test plan

  • New: TestSearchSongAsTrack::test_early_exits_after_max_results_accumulated — 15 always-valid candidates → validate_track_on_release.await_count <= MAX_SEARCH_RESULTS. Verified red on main, green here.
  • New: TestSearchCompilationsEarlyExit::test_main_gather_early_exits_after_max_results — same invariant for search_compilations_for_track. Verified red on main, green here.
  • The four LML#534 tests continue to pass: test_validate_track_calls_run_concurrently, test_output_order_follows_input_not_completion, test_matched_via_hint_order_follows_input, test_validate_concurrency_is_bounded.
  • TestWaveMergeGate (LML#527 regressions) all pass.
  • Full tests/unit suite: 2421 passed; one pre-existing event-loop teardown flake in tests/unit/test_dependencies.py reproduces identically on main and is unrelated.
  • ruff check + ruff format --check clean.

Closes #536

…her conversions (LML#536)

PR #534 parallelized search_song_as_track's per-release validate loop with asyncio.gather + Semaphore(5), modelled on search_compilations_for_track. Both functions now schedule every candidate's validate_track_on_release call and truncate the post-gather accumulator at MAX_SEARCH_RESULTS=5. Wall time was bounded by the semaphore but call budget and Discogs rate-limiter pressure were not — high-fanout requests (~15 candidates) paid for ~10 wasted validate round-trips after the response cap had already saturated. Under the degraded PG-cache write path (Sentry LIBRARY-METADATA-LOOKUP-9) each wasted validate falls through to /releases/<id> (p95 5.87s).

Add a _chunked_gather async-generator helper that dispatches workers in chunks of MAX_SEARCH_RESULTS and yields (item, result) pairs in input order. The next chunk is not dispatched until the caller has finished iterating the previous one, so a caller that breaks out after its accumulator saturates never pays for un-fired chunks — restoring the pre-PR early-exit invariant without giving up within-chunk parallelism.

Apply the helper at all three sites: search_song_as_track replaces the gather+semaphore shape (the semaphore becomes redundant — chunk size already caps in-flight to 5); search_compilations_for_track wraps both the main raw_releases gather and the album-title fallback gather.

Repurpose _SONG_AS_TRACK_VALIDATE_CONCURRENCY (value unchanged at 5) as the shared chunk-size constant; docstring updated. Pin the early-exit invariant with two new tests, one per function, asserting validate_track_on_release.await_count <= MAX_SEARCH_RESULTS given 15 always-valid candidates. The four LML#534 tests (order preservation, hint order, concurrency bound, concurrent execution) continue to pass.

Closes #536
…NCY, switch fallback to functools.partial, relocate compilations early-exit test

The semaphore that motivated _SONG_AS_TRACK_VALIDATE_CONCURRENCY is gone — the constant is now just the chunk size, hand-coded equal to MAX_SEARCH_RESULTS with the "first chunk satisfies the cap" invariant documented only in prose. Inline MAX_SEARCH_RESULTS at the three _chunked_gather call sites so the invariant is textual at the call, and fold the remaining rationale into the helper's docstring. The test that imported the constant now imports MAX_SEARCH_RESULTS.

Replace the four-line `_fallback_worker` closure with `functools.partial(process_release, skip_self_named_album=False, skip_artist_match_filter=True)` — partial is already imported and used pervasively in this module; the closure was carrying no extra logic.

`_release_info` was unused at both compilations call sites; switch to bare `_`. Add an `assert chunk_size > 0` in `_chunked_gather` so a future misconfiguration fails loudly instead of silently no-op-ing (range step ≤ 0).

Move TestSearchCompilationsEarlyExit out of test_compilation_wave_merge.py (whose docstring scopes it to LML#527 wave-merge gate behavior) into test_orchestrator_helpers.py alongside the parallel TestSearchSongAsTrack early-exit test. Tests now cluster by function rather than scattering by mechanism.
@jakebromberg jakebromberg merged commit f11ac21 into main Jun 12, 2026
11 checks passed
@jakebromberg jakebromberg deleted the feat/restore-max-results-early-exit branch June 12, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(orchestrator): restore MAX_SEARCH_RESULTS early-exit lost in asyncio.gather conversions

1 participant