[CBRD-26900] Evaluate eligible after-join predicates in the hash join probe loop#7269
Open
youngjinj wants to merge 41 commits into
Open
[CBRD-26900] Evaluate eligible after-join predicates in the hash join probe loop#7269youngjinj wants to merge 41 commits into
youngjinj wants to merge 41 commits into
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-padding Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er_probe Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(inner) For INNER hash joins, move two-input residual conditions (incl. non-equi / range join conditions such as t1.c < t2.d) from the parent buildlist's if_pred into the hash-join probe loop (proc->probe_pred). qo_collect_hashjoin_probe_terms() selects from plan->sarged_terms the terms that reference only the two join inputs and are not already realized as hash keys / join edges, excluding inst_num()/rownum (TOTALLY_AFTER_JOIN) and correlated-subquery terms. gen_hashjoin() prunes the selected terms from both plan->sarged_terms and the local pred_set copy that feeds the parent list scan's if_pred, so the residual is evaluated in exactly one place. qo_init_projection_info() is extended so the probe terms' columns are added to outer/inner pred_list (regu_list_pred coverage, fetched into val_descr at probe time) and to the build/probe projection (name_list), without affecting the hash-join's final output. Outer joins are untouched (their inter-table ON condition lives in during/after_join_terms, not sarged_terms; guarded explicitly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… pushdown Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
For LEFT/RIGHT OUTER hash joins, move the two-input WHERE residual conditions that live in plan->plan_un.join.after_join_terms (e.g. "t1 LEFT JOIN t2 ON t1.a=t2.b WHERE t2.d is null or t1.c < t2.d") into the hash-join probe loop (proc->probe_pred), instead of evaluating them in the parent buildlist's after_join_pred during the second scan. after_join terms are applied AFTER null-padding in the existing second scan; the outer probe applies probe_pred to the final (matched or null-padded) tuple with the same semantics (build side cleared to NULL on null-fill), so the result is unchanged. ON-clause conditions (during_join_terms) drive matching / null-padding and are left untouched. qo_hashjoin_probe_term_eligible() factors out the per-term eligibility test shared by the INNER (sarged_terms) and the new OUTER (after_join_terms) collectors: term references only the two join inputs, no correlated subquery, not output-position dependent (inst_num/rownum), not already a hash key / join edge, and only PT_NAME segments (fetchable through regu_list_pred at probe time). qo_collect_hashjoin_probe_after_terms() collects eligible terms for JOIN_LEFT / JOIN_RIGHT only; FULL OUTER (JOIN_OUTER) is excluded since the serial outer probe never reaches it. gen_hashjoin() prunes the collected terms from both plan->plan_un.join.after_join_terms and the local pred_set copy (which unions after_join_terms in gen_outer) that feeds the parent after_join_pred, so each is evaluated in exactly one place, and unions them into the probe_terms bitset that already flows through qo_init_projection_info (column coverage) and make_hashjoin_proc (combined probe predicate via is_always_true). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…probe passes A LEFT OUTER hash join carrying both an ON-clause non-equi term (during_join_pred) and a WHERE term referencing inner-table columns (probe_pred) crashed the server with an or_advance assertion abort. qo_init_projection_info builds outer/inner pred_list in two passes: the during-join pass and the probe-term pass. A column referenced by both (e.g. t1.c, t2.d in "ON t1.a=t2.b AND t1.c<t2.d WHERE t2.d IS NULL OR t1.c<t2.d") was appended to pred_list twice, producing a regu_list_pred with duplicate TYPE_POSITION entries (pos_no 2, 2). fetch_peek_dbval_pos walks a single forward-only OR_BUF iterator over the regu list, assuming non-decreasing AND effectively distinct positions: after consuming pos 2 it advances the value index to 3, then the second pos-2 regu forces qfile_locate_tuple_next_value to read a non-existent 4th value of a 3-value tuple, overrunning the buffer and tripping or_advance's assert (object_representation.h:1478). Fix: track segments already added to each side's pred_list with two bitsets and skip duplicates in both passes, so pred_list holds one entry per segment, matching the bitset-built name_list and yielding distinct, monotonic regu positions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The correlated-subquery result cache (pt_make_sq_cache_key_struct) walks every predicate attached to the cached XASL - spec preds, if_pred, during_join_pred, after_join_pred - so the key is exhaustive over each DB_VALUE the subquery can read. The HASHJOIN_PROC residual predicate proc.hashjoin.probe_pred was the one pred field omitted from that walk. Today this is not a correctness bug: qo_hashjoin_probe_term_eligible excludes any term carrying a correlated subquery and requires the term to reference only the two join inputs, so probe_pred can never hold a correlated value that the key would otherwise miss (verified: a correlated hash-join subquery whose two-input residual is pushed to probe_pred while the correlated term stays in if_pred returns results identical to the nested-loop ground truth, including on repeated correlated key values). Add probe_pred to the key walk anyway, guarded by type == HASHJOIN_PROC, so the key stays complete and a future relaxation of the push-eligibility rules cannot silently cause a cache-key collision. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…and docs Plan visibility: gen_hashjoin removes the pushed residual terms from plan->sarged_terms and plan->plan_un.join.after_join_terms before the plan dump runs (pt_to_buildlist_proc -> qo_to_xasl -> gen_hashjoin precedes qo_plan_dump in pt_to_xasl), so a pushed predicate such as "t1.c < t2.d" disappeared from every structural plan section - it survived only in the rewritten "Query stmt:" text. A DBA could not see where the residual was evaluated. Record the pushed terms on the plan in a new join.probe_terms bitset and print them as a "probe:" line in qo_plan_print_outer_join_terms, mirroring the existing "during:" / "after:" lines. The detailed plan dump now shows e.g. "edge: term[1]" / "probe: term[0]" for an inner hash join with a non-equi residual. Also free hash_terms in qo_join_free (previously init'd but never released). Naming clarity (mechanical): - local after_probe_terms -> probe_after_join_terms - qo_collect_hashjoin_probe_after_terms -> qo_collect_hashjoin_after_join_probe_terms Wording: the eligibility guard rejects a term only when one of its SEGMENTS is not a plain column (PT_NAME); expressions built over plain columns (e.g. "t1.c between t2.d-10 and t2.d+10", "upper(t1.s)=t2.s2") have only PT_NAME segments and ARE pushed. Correct the misleading comment in qo_hashjoin_probe_term_eligible and the spec section 6 exclusions to match this actual, verified behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The parallel gather scan's qualification loop drain_slot_oids evaluated only m_xasl->if_pred before writing rows, silently dropping after_join_pred. LEFT OUTER join (merge/hash) plans whose materialized result list is scanned in parallel with an after-join WHERE returned all matched rows instead of the filtered subset. Mirror the serial buildlist loop (query_executor.c: after_join_pred then if_pred): evaluate after_join_pred before if_pred in drain_slot_oids, applying to MERGEABLE_LIST, BUILDVALUE_OPT and XASL_SNAPSHOT alike (the eval site precedes the result-type switch). Also content-check after_join_pred in px_scan_checker at both if_pred sites so parallel-unsafe predicate elements disqualify the scan by the same rules. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…redicate evaluation Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ined residual set The intermediate after_join_residual_terms bitset existed only to know what to prune from plan_un.join.after_join_terms. sarged_terms and after_join_terms are disjoint by construction (query_planner.c subtracts sarg_out_terms, which includes after_join_terms, when building sarged_terms), so both collectors can fill the single residual_terms set and the combined set can be subtracted from each source directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ity, hoist null clear Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The residual predicate is evaluated before the tuple merge at every probe site, so the member order now mirrors the evaluation order. Serialization is unaffected (xts/stx encode field order explicitly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mber order Pack, size, unpack, and dump residual_pred before merge_info, matching HASHJOIN_PROC_NODE member order. Write/read symmetry is preserved (both sides moved together); XASL streams are transient within one build, so no compatibility concern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…MANAGER residual_pred was listed under the "Pointer to a member of XASL_NODE" block, but it points to a HASHJOIN_PROC_NODE member. Move it into its own provenance group, matching the struct's existing comment style, and mirror the same grouping at the hjoin_init_manager assignment site. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… after_join_pred
Drop the dedicated HASHJOIN_PROC_NODE.residual_pred field and store the
residual conditions pushed into the hash-join probe loop on the HASHJOIN
xasl node's own after_join_pred slot instead.
A reachability audit of every after_join_pred reader confirms this is safe
and strictly simpler: the HASHJOIN node sits on the parent's aptr_list, has
no spec_list, and never runs the generic scan loop, so the scan-loop
evaluators of after_join_pred are unreachable by construction on this node.
The only readers that see the field (px_scan_checker, px_query_checker) are
content/eligibility observers, not result-affecting evaluators.
make_hashjoin_proc now reuses add_after_join_predicate (mirroring the
during_join_pred site) to store the pushed residual, and hjoin_init_manager
sources HASHJOIN_MANAGER.residual_pred from xasl->after_join_pred. An
assert (xasl->spec_list == NULL) guards the invariant that a HASHJOIN node
never gains a scan-loop execution; otherwise the generic scan loop would
double-evaluate after_join_pred alongside the probe.
Because the slot is a generic xasl-node field, ALL the dedicated residual_pred
plumbing is deleted, not migrated -- the generic per-node paths already cover
the HASHJOIN node:
- serialization/deserialization (xts_process/stx_build hashjoin proc)
- the two clears in qexec_clear_xasl / qexec_clear_xasl_for_parallel_aptr
(generic after_join_pred clear runs in the unconditional is_final block)
- the HASHJOIN-gated sq-cache-key block (generic after_join_pred block is
reached via the aptr_list SQ_TYPE_XASL recursion)
- the dedicated qdump [residual_pred] print (generic after_join_pred print
in qdump_print_xasl covers it)
The runtime "residual" terminology is kept everywhere (HASHJOIN_MANAGER /
HASHJOIN_CONTEXT.residual_pred copies, px spawn get_residual_pred /
m_residual_pred, optimizer residual_terms bitsets, the residual: plan label);
only the XASL storage source changed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pushed hash-join residual predicate is stored in the HASHJOIN xasl node's after_join_pred slot. Rename the runtime copies of that predicate (HASHJOIN_CONTEXT/HASHJOIN_MANAGER fields, the spawn_manager member and accessor, and the null-fill helper) from residual_pred to after_join_pred so the copies mirror their XASL source field name, matching the existing during_join_pred convention. Optimizer-side "residual" classification terminology is unchanged (residual_terms bitset, qo_collect_hashjoin_residual_terms and friends, the plan "residual:" label). Stale comment references to the renamed runtime field were updated to track the new name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lumbing sites The generic xasl-node clear and dump paths need no local commentary; the design is documented in the spec. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…al pushdown Prune only the local pred_set copy when pushing hash-join residual terms into the probe loop. The audit confirms this is sufficient for exactly-once evaluation: gen_hashjoin's pred_set is the local predset built by gen_outer and is the only source for this join node's parent if_pred/after_join_pred (built in init_list_scan_proc); the plan's sarged_terms/after_join_terms are never read to build that parent predicate after gen_hashjoin runs. inst_num()/rownum terms are excluded from collection and make_outer_instnum is not on the hash-join path, so instnum handling is unaffected. Stop mutating plan->sarged_terms and plan->plan_un.join.after_join_terms, and drop the plan_un.join.residual_terms record together with its "residual:" dump label; the pushed terms now remain visible in their original sargs:/after: sections, and the probe-time evaluation site is observable via PROBE row counts in the server trace. This also removes the single-invocation fragility that the plan-bitset mutation forced. The hash_terms bitset_delset leak fix in qo_join_free is retained. Spec updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold the two-input residual-term classification into qo_init_projection_info and carry the pushed predicate on the HASHJOIN node's after_join_pred slot (no new plan/XASL fields). Executor: - Unify serial and parallel predicate evaluation through a shared hjoin_eval_pred(), memoizing per-tuple fetches with is_ready. - Inline the build-side null-fill; drop fill_qualified for direct ev_res branching. - Fix the parallel outer null-fill branch: the after-join early continue skipped the need_skip_next reset, leaving the flag set so the next hjoin_fetch_key aborted (debug) or mis-processed a valid probe row (release). Reset need_skip_next at the top of the branch. Optimizer: gate the ORDER BY skip on need_final_sort (qo_top_plan_new, qo_plan_is_orderby_skip_candidate, qo_plan_cmp) so hash/merge-join plans no longer drop the sort from the dump while execution sorts. px_scan: order the after_join_pred/if_pred check clusters to match drain_slot_oids and drop the now-redundant SYNC GUARD comments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
❌ TC Merge Gate — Merge BlockedOne or more TC PRs are still open. Please merge or close them before merging this PR. TC Repositories & Branches:
Steps to unblock:
|
Contributor
|
Reviews (1): Last reviewed commit: "Merge remote-tracking branch 'upstream/d..." | Re-trigger Greptile |
youngjinj
added a commit
to CUBRID/cubrid-testcases
that referenced
this pull request
Jun 10, 2026
…D/cubrid#7269 Query plans now show an explicit SORT (order by) step where the ORDER BY skip optimization no longer applies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
Author
|
/run all |
1 similar comment
Contributor
Author
|
/run all |
…parallel fallback The single-thread fallback path is guarded by degree < 2, so degree can be 0 or 1. The assert (degree == 0) wrongly excluded the degree == 1 case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/run all |
Contributor
Author
|
/run all |
youngjinj
added a commit
to CUBRID/cubrid-testcases
that referenced
this pull request
Jun 23, 2026
…CUBRID/cubrid#7269 Cases #4 (multiple tables), #5 (inline views) and #8 (json output) were missed in the earlier partial update (3bcd97b). The ORDER BY skip optimization no longer applies over the hash join, so the plan now shows an explicit temp(order by) / SORT (order by) step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/run all |
The during/after-join predicates of a hash join reference columns from both
inputs. pt_to_pred_expr resolves a list-file column to its DB_VALUE through the
column's table_info, but each column's live value lives in its input
buildlist_proc's val_list at its name_list position - the slot
proc.regu_list_pred fetches into. The two can disagree:
- a simple-spec input whose val_list / attribute_list orderings differ (an
input that also projects an expression column), or
- a nested-plan input whose buildlist val_list is no spec's value_list at all.
In both cases the predicate read an unfetched, stale DB_VALUE and produced
wrong results vs nested-loop (e.g. TPC-H Q7 returned 0 rows under use_hash).
Bind the predicate columns through a combined listfile context (outer ++ inner
name_list / value_list, the value_list sharing the inputs' buildlist DB_VALUEs)
installed on the symbol table while generating the during/after-join
predicates, so pt_to_pred_expr resolves each column to exactly the slot
regu_list_pred fetches. Restored after generation and on the error path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…shjoin_proc Inline the single-use make_hashjoin_listfile_val_list helper, snapshot/restore the symbol table through one SYMBOL_INFO (save_symbol) instead of four save_* fields, and fold the during/after predicate generation into the listfile-context block with a single restore point (dropping the listfile_ctx_set flag). No behavior change; clarify comments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/run all |
qdump_print_xasl recursed into a buildlist's eptr_list but not into a CTE_PROC's inner trees, so xasl_debug_dump never showed the plan inside a CTE. Recurse into proc.cte.non_recursive_part / recursive_part so a materialized CTE's inner plan (hash joins, etc.) is dumped in tree position under the cte_proc node. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # src/query/query_hash_join.c
…ash join Apply code_style.sh and unify the need_skip_next = false comment as /* init */ in px_hash_join_task_manager.cpp. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/run all |
Contributor
Author
|
/run all |
Contributor
|
Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'upstream/d..." | Re-trigger Greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
http://jira.cubrid.org/browse/CBRD-26900
Purpose
해시 조인은 잔여 조건(inner join의 비동등 조건, outer join의 after-join 조건)을 리스트 파일로 저장된 조인 결과를 상위 스캔에서 다시 읽으며 평가하므로, 걸러질 튜플까지 일단 저장했다가 한 번 더 읽는다. 이 변경은 probe 단계에서 평가 가능한 잔여 조건을 해시 키가 매칭된 시점에 바로 평가해, 조건을 만족하지 않는 튜플을 리스트 파일에 저장하기 전에 걸러낸다.
Implementation
after_join_pred)로 전달한다. 각 조건이 한 곳에서만 평가되도록 상위 스캔에서는 제거하며, 최종 결과가 확정되어야 하거나 서브쿼리 실행이 필요한 조건은 기존대로 상위 스캔에 남긴다.after_join_pred가 평가되지 않아 튜플이 걸러지지 않던 문제를 보강한다.SORT_ORDERBY)를 명시적으로 추가해, 플랜 출력과 실제 실행이 일치하게 한다. 서브 플랜의 order-by skip은 유지해 부분 범위 처리를 보존한다.Remarks
after_join_pred필드를 재사용한다.need_final_sort관련 optimizer 변경과 px_scan 변경은 본 작업 중 발견한 기존 regression 수정으로, 해시 조인에서 발생하는 문제라 연관성이 깊어 여기서 함께 해결한다.