Fix mathematical, bitwise and logical operator precedence in SQL#4221
Fix mathematical, bitwise and logical operator precedence in SQL#4221hazefully wants to merge 2 commits into
Conversation
b7a3074 to
d2e65d5
Compare
📊 Metrics Diff Analysis ReportSummary
ℹ️ About this analysisThis automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:
The last category in particular may indicate planner regressions that should be investigated. New QueriesCount of new queries by file:
|
| | left=expressionAtom operator=(BIT_SHIFT_LEFT_OP | BIT_SHIFT_RIGHT_OP) right=expressionAtom #bitExpressionAtom // done | ||
| | left=expressionAtom operator=BIT_AND_OP right=expressionAtom #bitExpressionAtom // done | ||
| | left=expressionAtom operator=(BIT_XOR_OP | BIT_OR_OP) right=expressionAtom #bitExpressionAtom // done |
There was a problem hiding this comment.
@hatyo Do you know if it is intentional that we used to have a higher precedence for bitwise operators over arithmetic operators? I think most languages (with few exceptions like Pascal) give a higher precedence to arithmetic operators so the expression 6 & 4 + 4 would be zero instead of 8, but it looks like we have some unit tests that assert we actually give precedence to the bitwise operators:
There was a problem hiding this comment.
this is not intentional, put differently, we did not make an explicit decision to give bitwise operations higher precedence over the other operators. The referenced unit test does not verify precedence, it just verifies that plan constraints are created exactly as they should be. That's all.
There was a problem hiding this comment.
I think most languages (with few exceptions like Pascal) give a higher precedence to arithmetic operators so the expression 6 & 4 + 4 would be zero instead of 8
I mean sure, but we should align with DB vendors (since bitwise operators are not part of the SQL standard) who for the most part, align with programming languages operator precedence conventions.
Bitwise operators are usually lower than basic arithmetic operator, and higher than logical operators. We should align with that. For example in PgSQL:
SELECT
1 + 3 & 5 AS actual, -- 4 (proves + binds first)
(1 + 3) & 5 AS arith_first, -- 4 (same as actual)
1 + (3 & 5) AS bitwise_first -- 2 (different)
;
actual | arith_first | bitwise_first
--------+-------------+---------------
4 | 4 | 2
SELECT
3 & 2 != 0 AS actual, -- true (proves & binds first)
(3 & 2) != 0 AS bitwise_first, -- true (same as actual)
3 & (2 != 0)::int AS comparison_first -- 1 (different)
actual | bitwise_first | comparison_first
--------+---------------+------------------
true | true | 1Doing so, may break some customer queries though, so we have to be extra careful.
(once #4199 is ready, we can use it to introduce these illustrative examples).
| .put("dot_product_distance", argumentsCount -> BuiltInFunctionCatalog.resolve("dot_product_distance", argumentsCount)) | ||
| .put("not", argumentsCount -> BuiltInFunctionCatalog.resolve("not", argumentsCount)) | ||
| .put("and", argumentsCount -> BuiltInFunctionCatalog.resolve("and", argumentsCount)) | ||
| .put("&&", argumentsCount -> BuiltInFunctionCatalog.resolve("and", argumentsCount)) |
There was a problem hiding this comment.
This does not seem relevant to this PR, can you please revert it (and line 137)? We can decide later if we want to add support to the non-standard && and ||.
| | left=expressionAtom operator=(BIT_SHIFT_LEFT_OP | BIT_SHIFT_RIGHT_OP) right=expressionAtom #bitExpressionAtom // done | ||
| | left=expressionAtom operator=BIT_AND_OP right=expressionAtom #bitExpressionAtom // done | ||
| | left=expressionAtom operator=(BIT_XOR_OP | BIT_OR_OP) right=expressionAtom #bitExpressionAtom // done |
There was a problem hiding this comment.
this is not intentional, put differently, we did not make an explicit decision to give bitwise operations higher precedence over the other operators. The referenced unit test does not verify precedence, it just verifies that plan constraints are created exactly as they should be. That's all.
| | left=expressionAtom operator=(BIT_SHIFT_LEFT_OP | BIT_SHIFT_RIGHT_OP) right=expressionAtom #bitExpressionAtom // done | ||
| | left=expressionAtom operator=BIT_AND_OP right=expressionAtom #bitExpressionAtom // done | ||
| | left=expressionAtom operator=(BIT_XOR_OP | BIT_OR_OP) right=expressionAtom #bitExpressionAtom // done |
There was a problem hiding this comment.
I think most languages (with few exceptions like Pascal) give a higher precedence to arithmetic operators so the expression 6 & 4 + 4 would be zero instead of 8
I mean sure, but we should align with DB vendors (since bitwise operators are not part of the SQL standard) who for the most part, align with programming languages operator precedence conventions.
Bitwise operators are usually lower than basic arithmetic operator, and higher than logical operators. We should align with that. For example in PgSQL:
SELECT
1 + 3 & 5 AS actual, -- 4 (proves + binds first)
(1 + 3) & 5 AS arith_first, -- 4 (same as actual)
1 + (3 & 5) AS bitwise_first -- 2 (different)
;
actual | arith_first | bitwise_first
--------+-------------+---------------
4 | 4 | 2
SELECT
3 & 2 != 0 AS actual, -- true (proves & binds first)
(3 & 2) != 0 AS bitwise_first, -- true (same as actual)
3 & (2 != 0)::int AS comparison_first -- 1 (different)
actual | bitwise_first | comparison_first
--------+---------------+------------------
true | true | 1Doing so, may break some customer queries though, so we have to be extra careful.
(once #4199 is ready, we can use it to introduce these illustrative examples).
| | EXISTS '(' query ')' #existsExpressionAtom // done | ||
| | expressionAtom predicate? #predicatedExpression | ||
| | expression logicalOperator expression #logicalExpression // done | ||
| | left=expressionAtom comparisonOperator right=expressionAtom #binaryComparisonExpression // done |
There was a problem hiding this comment.
Should these be expression comparisonOperator expression so we take advantage of left-recursion elimination and define clear precedence rules for groups of logical operators?
| ; | ||
|
|
||
| // Expressions, predicates | ||
| // Mathemtical and logical expressions must be listed as different alternatives to the left-recursive |
There was a problem hiding this comment.
Can you please rewrite the comment to better illustrate the intentions and guide future modifications to the rules below, something like:
put comparison and logical operators higher so they have less precedence than below bitwise and arithmetic operators. Order of rules matter to leverage inter-rule precedence rules as supported by Antlr4.
| - unorderedResult: [ { 20 } ] | ||
| - initialVersionAtLeast: !current_version | ||
| - unorderedResult: [ { 8 } ] |
There was a problem hiding this comment.
we should probably mark this PR as breaking change.
This PR changes our ANTLR4 parser rules to respect the precedence of mathematical, bitwise and logical operators while parsing expressions, leading to the correct results in cases where no explicit order of operations is enforced. The parser rules are changed so the different mathematical, bitwise and logical operators are listed as alternatives in the left-recursive expression rules, which makes ANTLR process them with the same precedence as the order of the alternate rules.
This approach is based on section "5.4 Dealing with Precedence, Left Recursion, and Associativity" from the The Definitive ANTLR 4 Reference.
This fixes #4219.