perf(orchestrator): restore MAX_SEARCH_RESULTS early-exit lost in gather conversions#538
Merged
Merged
Conversation
…her conversions (LML#536) PR #534 parallelized search_song_as_track's per-release validate loop with asyncio.gather + Semaphore(5), modelled on search_compilations_for_track. Both functions now schedule every candidate's validate_track_on_release call and truncate the post-gather accumulator at MAX_SEARCH_RESULTS=5. Wall time was bounded by the semaphore but call budget and Discogs rate-limiter pressure were not — high-fanout requests (~15 candidates) paid for ~10 wasted validate round-trips after the response cap had already saturated. Under the degraded PG-cache write path (Sentry LIBRARY-METADATA-LOOKUP-9) each wasted validate falls through to /releases/<id> (p95 5.87s). Add a _chunked_gather async-generator helper that dispatches workers in chunks of MAX_SEARCH_RESULTS and yields (item, result) pairs in input order. The next chunk is not dispatched until the caller has finished iterating the previous one, so a caller that breaks out after its accumulator saturates never pays for un-fired chunks — restoring the pre-PR early-exit invariant without giving up within-chunk parallelism. Apply the helper at all three sites: search_song_as_track replaces the gather+semaphore shape (the semaphore becomes redundant — chunk size already caps in-flight to 5); search_compilations_for_track wraps both the main raw_releases gather and the album-title fallback gather. Repurpose _SONG_AS_TRACK_VALIDATE_CONCURRENCY (value unchanged at 5) as the shared chunk-size constant; docstring updated. Pin the early-exit invariant with two new tests, one per function, asserting validate_track_on_release.await_count <= MAX_SEARCH_RESULTS given 15 always-valid candidates. The four LML#534 tests (order preservation, hint order, concurrency bound, concurrent execution) continue to pass. Closes #536
…NCY, switch fallback to functools.partial, relocate compilations early-exit test The semaphore that motivated _SONG_AS_TRACK_VALIDATE_CONCURRENCY is gone — the constant is now just the chunk size, hand-coded equal to MAX_SEARCH_RESULTS with the "first chunk satisfies the cap" invariant documented only in prose. Inline MAX_SEARCH_RESULTS at the three _chunked_gather call sites so the invariant is textual at the call, and fold the remaining rationale into the helper's docstring. The test that imported the constant now imports MAX_SEARCH_RESULTS. Replace the four-line `_fallback_worker` closure with `functools.partial(process_release, skip_self_named_album=False, skip_artist_match_filter=True)` — partial is already imported and used pervasively in this module; the closure was carrying no extra logic. `_release_info` was unused at both compilations call sites; switch to bare `_`. Add an `assert chunk_size > 0` in `_chunked_gather` so a future misconfiguration fails loudly instead of silently no-op-ing (range step ≤ 0). Move TestSearchCompilationsEarlyExit out of test_compilation_wave_merge.py (whose docstring scopes it to LML#527 wave-merge gate behavior) into test_orchestrator_helpers.py alongside the parallel TestSearchSongAsTrack early-exit test. Tests now cluster by function rather than scattering by mechanism.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
search_song_as_trackandsearch_compilations_for_track: onceMAX_SEARCH_RESULTSmatches accumulate, no furthervalidate_track_on_releasecalls fire._chunked_gather, an async-generator that dispatches workers in chunks and yields(item, result)in input order. The next chunk is not dispatched until the caller finishes iterating the previous one — breaking out of the loop after the response cap is hit aborts the generator before un-fired chunks dispatch.search_song_as_trackbecomes redundant (chunk size 5 already bounds in-flight to 5) and is removed._SONG_AS_TRACK_VALIDATE_CONCURRENCYis repurposed as the shared chunk-size constant (value unchanged); docstring updated.Background
PR #534 parallelized the SONG_AS_TRACK validate loop and modelled it on the existing
search_compilations_for_trackpattern. Both functions now schedule every candidate viaasyncio.gatherand only truncate post-hoc — so a high-fanout request with ~15 candidates paid for ~10 wasted/releases/<id>round-trips after the response cap had already saturated. The waste is compounded by Sentry LIBRARY-METADATA-LOOKUP-9 (degraded PG cache write path; p95 5.87s on the validate span).Test plan
TestSearchSongAsTrack::test_early_exits_after_max_results_accumulated— 15 always-valid candidates →validate_track_on_release.await_count <= MAX_SEARCH_RESULTS. Verified red onmain, green here.TestSearchCompilationsEarlyExit::test_main_gather_early_exits_after_max_results— same invariant forsearch_compilations_for_track. Verified red onmain, green here.test_validate_track_calls_run_concurrently,test_output_order_follows_input_not_completion,test_matched_via_hint_order_follows_input,test_validate_concurrency_is_bounded.TestWaveMergeGate(LML#527 regressions) all pass.tests/unitsuite: 2421 passed; one pre-existing event-loop teardown flake intests/unit/test_dependencies.pyreproduces identically onmainand is unrelated.ruff check+ruff format --checkclean.Closes #536