[Phase 5b] Top-K heap for ORDER BY ... LIMIT N (sq-lcw.4)#35
Merged
Conversation
Sorts feeding a small LIMIT now use a bounded max-heap of size limit + offset instead of buffering the full input. Memory stays O(k) regardless of input size, matching Phase 5b's bounded-memory goal for ORDER BY x LIMIT 100 over 100M-row scans. Planner: SortNode gains an optional `limit` cap. The planSelect / planSet construction sets it when LIMIT is defined, LIMIT + OFFSET <= 10000, and no DISTINCT sits between Sort and Limit (DISTINCT can drop rows, so the cap would be unsafe). Executor: a new TopKHeap maintains the worst-among-top-K candidate at the root. Sort keys are evaluated lazily, term by term, so multi-key ORDER BY keeps later (often expensive) terms unevaluated unless earlier terms tie — preserving the same cell-access economy the existing full-sort path guarantees. First-key evaluation is parallelized in chunks for streaming throughput. Falls back to the full-sort path when no limit hint is set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[Phase 5b] Top-K heap for ORDER BY ... LIMIT N. Implemented Phase 5b Top-K heap for ORDER BY ... LIMIT N. SortNode gains an optional 'limit' cap (LIMIT + OFFSET, <=10000, no DISTINCT). Executor uses a bounded max-heap with lazy multi-key evaluation: keeps memory O(k) and preserves the cell-access economy of the existing sort path (later sort keys unevaluated unless earlier keys tie). First-key evals are batched in parallel for throughput. Full sort remains the fallback when no limit hint is set. 1573/1573 tests pass; 20 new tests cover correctness, planner edge cases (DISTINCT, threshold, OFFSET), and a 100k-row streaming check that proves O(k) memory. Lint and tsc clean.
Delivery