Skip to content

Add RecordQueryStoreBindingPlan and cross-schema join support in planner (cross-schema 2/6)#4213

Draft
arnaud-lacurie wants to merge 12 commits into
FoundationDB:mainfrom
arnaud-lacurie:cross-schema/pr2
Draft

Add RecordQueryStoreBindingPlan and cross-schema join support in planner (cross-schema 2/6)#4213
arnaud-lacurie wants to merge 12 commits into
FoundationDB:mainfrom
arnaud-lacurie:cross-schema/pr2

Conversation

@arnaud-lacurie

Copy link
Copy Markdown
Collaborator

Summary

  • Tags LogicalTypeFilterExpression with a SchemaIdentifier so every scan expression knows which schema it belongs to
  • Introduces RecordQueryStoreBindingPlan — a thin plan wrapper that swaps in a secondary FDBRecordStoreBase (looked up by schema name from EvaluationContext) before executing its inner subtree; transparent to all planning logic
  • Adds EvaluationContext.withAuxiliaryStore / getAuxiliaryStore for the runtime store lookup
  • Extends MetaDataPlanContext to carry secondary-schema metadata and wires ImplementNestedLoopJoinRule to emit RecordQueryStoreBindingPlan when the inner side belongs to a secondary schema
  • Wires the cross-schema schema.table reference resolution end-to-end through SemanticAnalyzer (secondary-schema lookup) and LogicalOperator
  • Fixes a WHERE-predicate drop bug in the cross-schema correlated scan path
  • Fixes AbstractDataAccessRule secondary-schema scan wrapping (must happen at scan-creation time, before compensation wraps in logical expressions)
  • Full proto serialization for RecordQueryStoreBindingPlan (field 41 in PRecordQueryPlan) enabling continuation support across schema boundaries

Stacking

This is PR 2 of 6 in the cross-schema join series. All PRs target main.

# PR Topic
1 #4212 Plan-cache key for multi-schema queries
2 this SchemaIdentifier tag + store-binding infrastructure
3 Secondary-store wiring into execution context
4 Type-namespace collision avoidance + catalog metadata
5 Java integration tests
6 YAML integration tests + documentation

Test plan

  • Cross-schema join queries produce a STORE_BIND node in EXPLAIN output
  • RecordQueryStoreBindingPlan proto round-trip (serialization/deserialization)
  • Continuation round-trip works for cross-schema scans
  • WHERE predicates are not dropped on correlated cross-schema scans

Introduce SchemaIdentifier value type (null = current schema) and attach
it as a metadata field to FullUnorderedScanExpression and
LogicalTypeFilterExpression. Both expressions gain a getSchemaId() accessor
and a second constructor/factory overload accepting a SchemaIdentifier.

The field does not participate in equalsWithoutChildren or
computeHashCodeWithoutChildren to keep planning behavior unchanged —
ImplementNestedLoopJoinRule will read it directly in a later step.
EvaluationContext gains an ImmutableMap<String, FDBRecordStoreBase<?>>
auxiliaryStores field (empty by default) plus getAuxiliaryStore(name) and
withAuxiliaryStores(map) accessors. withBinding() preserves the map.

RecordQueryStoreBindingPlan is a new transparent plan node that intercepts
executePlan() and re-dispatches to the secondary store bound for schemaId
in EvaluationContext. All planning properties delegate to the child plan.
Visitor implementations added for all RecordQueryPlanVisitor and
RelationalExpressionVisitor subscribers.

The plan node is unreachable from any existing query path until
ImplementNestedLoopJoinRule starts emitting it (Step 11).
- SemanticAnalyzer: secondary schema lookup (setSecondarySchemaLookup),
  lazy loading and caching of secondary templates (tryLoadSecondarySchema),
  cross-schema tableExists/getTable, getSchemaIdFor, getLoadedSecondarySchemas,
  getAllTableStorageNamesForTemplate
- LogicalOperator.generateTableAccess: derive SchemaIdentifier per table,
  use secondary template for tableNames and isStoreRowVersions; tag
  LogicalTypeFilterExpression with the schema identifier
- PlanContext: secondary schema lookup function + additional schemas map
  (additionalSchemas); withAdditionalSchemas factory method; Builder adds
  withSecondarySchemaLookup setter, build passes new fields
- EmbeddedRelationalStatement / PreparedStatement: populate secondary
  schema lookup from backing catalog in createPlanContext
- PlanGenerator: create BaseVisitor before logical plan generation so the
  secondary schema lookup can be set; after generation collect loaded
  secondary schemas and enrich PlanContext via enrichWithSecondarySchemas
- CascadesPlanner: add planGraph overload accepting additional schemas that
  delegates to MetaDataPlanContext.forRootReferenceWithAdditionalSchemas
- QueryPlan.LogicalQueryPlan.optimize: dispatch to the multi-schema planGraph
  overload when additionalSchemas is non-empty
- QueryPlan.PhysicalQueryPlan.executeInternal: collect required secondary
  schema names from RecordQueryStoreBindingPlan nodes and inject the
  corresponding FDBRecordStoreBase instances into EvaluationContext via
  withAuxiliaryStores before executing the plan
… tests

When a PredicateWithValueAndRanges has a single range with multiple equality
comparisons (e.g. JOIN + WHERE on the same field), asComparisonRange() silently
drops all but the first comparison as residuals. The compensation lambda
previously returned noCompensationNeeded() whenever the Placeholder alias was
bound, causing the constant equality (the WHERE filter) to be silently dropped
from the generated plan.

Fix: replay the merge sequence inside the compensation lambda to detect residual
comparisons. When residuals exist, build a new single-range PVWR from them and
apply it as a filter via computeCompensationFunctionForLeaf, ensuring the plan
emits SCAN([IS ITEMS, EQUALS q6.ITEM_ID]) | FILTER(output.ID = 2) rather than
an unfiltered correlated scan.

Also fix schemaIdFromReference to recurse into quantifier child groups so that
cross-schema plans nested inside a partitioned SelectExpression are wrapped with
RecordQueryStoreBindingPlan. Covers AbstractDataAccessRule schema detection for
SelectExpression alternatives created by predicate push-down rules.

Add CrossSchemaJoinTest covering: plain cross-schema join, join with WHERE
filter on the primary-schema table, and standalone select from a secondary
schema.
Replace raw String primary cache key with PlanCacheSchemaKey holding an
ImmutableSortedSet of schema names. Replace single schemaTemplateVersion
in QueryCacheKey with ImmutableSortedMap<String,Integer> schemaVersions,
enabling cross-schema plan cache keying.

Affects: PlanCacheSchemaKey (new), QueryCacheKey, RelationalPlanCache,
AstNormalizer.NormalizationResult, PlanGenerator, and their tests.
…to serialization to RecordQueryStoreBindingPlan

AbstractDataAccessRule.createScansForMatches now wraps scan plans in
RecordQueryStoreBindingPlan at creation time when the match candidate belongs
to a secondary schema. This places the store-binding node inside any
compensation wrappers (MAP, SELECT) produced by
Compensation.applyAllNeededCompensations, so the secondary store binding
survives to the final physical plan for standalone secondary-schema queries
as well as cross-schema joins.

The previous approach wrapped only at the dataAccessExpressions level after
compensation; the instanceof RecordQueryPlan check failed for logical
SelectExpression compensation wrappers, causing the secondary store to be
silently dropped.

RecordQueryStoreBindingPlan gains full proto serialization: adds
PRecordQueryStoreBindingPlan message (field 41 in PRecordQueryPlan.oneof),
implements toProto/toRecordQueryPlanProto/fromProto, and registers a
@autoservice(PlanDeserializer.class) Deserializer inner class. This
enables continuation support for queries spanning secondary schemas.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant