-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: add parentheses to nested binary expressions in Display and SqlDisplay #19916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- deleted, was a progress meter for myself --
b4c1d63 to
9bba10b
Compare
|
Updated after rebase on main. Note: There are also 3 files in the
These contain error message patterns that now include the additional parentheses (e.g., I'll create a separate PR on |
Always wrap nested binary expressions in parentheses when formatting to avoid ambiguity. For example, `(1+2)*3` now displays as `(Int64(1) + Int64(2)) * Int64(3)` instead of the misleading `Int64(1) + Int64(2) * Int64(3)`. This follows DuckDB's approach of always adding parentheses around nested binary expressions. Fixes apache#16054
Updates all test snapshots affected by the binary expression Display parentheses change. The `test_pretty_roundtrip` test had a flawed assertion that checked structural equality after round-tripping through the pretty unparser. This was incorrect because the pretty unparser intentionally removes "unnecessary" parentheses, which can change expression associativity: - Input: `(id + 5) * (age * 8)` (right-grouped multiplication) - Pretty output: `(id + 5) * age * 8` - Re-parsed: `((id + 5) * age) * 8` (left-grouped due to associativity) These are semantically equivalent for associative operations but structurally different, so the round-trip assertion was removed.
Updated expected outputs for binary expression parentheses changes.
9bba10b to
b93952e
Compare
| @r#" | ||
| Projection: data.a, data.f | ||
| Filter: data.f = Utf8("a") OR data.f = Utf8("b") OR data.f = Utf8("c") OR data2.mark | ||
| Filter: (((data.f = Utf8("a")) OR (data.f = Utf8("b"))) OR (data.f = Utf8("c"))) OR data2.mark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to omit these parentheses in cases like this? Since it's all OR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to explain what i found but my current understanding might be limited.
Seems like DuckDB also always adds parentheses, see conjunction_expression.hpp#L42-L48
they represent a OR b OR c as a flat list of children (ConjunctionExpression), they get (a OR b OR c) while our nested BinaryExpr tree produces ((a OR b) OR c).
Its the reason why i think we should discuss this further as its kinda delicate, and more people should take a look
Thank you for your time mr.Vo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense; I suppose it's more a limitation on how we represent OR rather than on the display code in this PR 👍
|
I'm a little confused by the intent of this PR; the body seems to imply it fixes cases where parentheses aren't being added to preserve precedence, but I'm fairly sure we already have this code? Especially judging by the discussion on #16054 Moreover the example in the PR body is incorrect with the behaviour of this PR:
I tried out this PR with cli and the parentheses aren't added because this is a schema name. And there is a test added in this PR contradicting this statement. Could you help clarify what exactly this PR is achieving that we don't have on main? |
|
This originally contained changes to both 'Display' and 'SchemaDisplay', but then i had to revert schema display as It seems It Needed to be stabile for optimizer field lookup. Unfortunately this way It doesn't solve the original Issue pointed in #16054, the parentheses won't appear in CLI, which Is also part of why i asked for direct help. While It should still provide an improvement for SQL generation and debug.. its not shaping the way i initially thought/wanted. |
|
Would you be able to update the title & PR body to more accurately represent what changes are being made here? I'm still not exactly clear what the intended changes are which makes it harder to review |
|
Hi @Jefffrey, thanks for the feedback. I've updated the PR title and body to reflect the current state of the changes after the Given the scope creep of ~44 sqllogictest updates, 4 Happy to hear your thoughts on whether to proceed or hold off, and if its the second case, i apologize for wasting someone else time, the intent behind was genuine. |
Which issue does this PR close?
Closes #16054
Rationale for this change
Binary expressions like
(1+2)*3were displayed incorrectly asInt64(1) + Int64(2) * Int64(3)inDisplayandSqlDisplayoutput, without parentheses to preserve operator precedence. This is misleading because it does not reflect the actual expression tree structure.What changes are included in this PR?
DisplayforBinaryExprto always wrap nested binary expressions in parentheses (following DuckDB's approach). This simplifies the previous precedence-based logic and avoids ambiguity.SqlDisplayfor SQL generation output.SchemaDisplayclarifying that it intentionally does not add parentheses, because schema names must remain stable for field lookups in optimizers likecommon_subexpr_eliminate.What this PR does NOT change
SchemaDisplayis intentionally unchanged. Schema names must remain stable because they are used as field lookup keys in optimizers (e.g.common_subexpr_eliminate). Changing them would break field resolution. This was initially included and then reverted after discovering it caused optimizer failures.select (1+2)*3, the column header usesSchemaDisplay, so it will still showInt64(1) + Int64(2) * Int64(3)without parentheses. Fixing the CLI column headers would require changes toSchemaDisplay, which is out of scope for the reasons above.Known limitation: excessive parentheses for associative operators
Because our expression tree represents
a OR b OR cas nestedBinaryExprnodes ((a OR b) OR c), always-parenthesizing produces(a OR b) OR cinstead of the flata OR b OR c. DuckDB avoids this by using a flatConjunctionExpressionwith a list of children, so they get(a OR b OR c). This is a structural limitation of how DataFusion represents expressions, not something that can be fixed in theDisplaylayer alone.Pending:
datafusion-testingsubmodule updateThe "Run sqllogictests with the sqlite test suite" CI check fails because 4 files in the
datafusion-testingsubmodule contain error message regex patterns that need updating to match the new parenthesized output. The affected files are:data/sqlite/index/random/10/slt_good_0.slt(5 errors)data/sqlite/index/random/10/slt_good_13.slt(10 errors)data/sqlite/index/random/10/slt_good_4.slt(5 errors)data/sqlite/index/random/10/slt_good_6.slt(5 errors)A separate PR on
datafusion-testingwill be needed to update these regex patterns, and then this PR's submodule pointer will need to be updated to point to the new commit.How are these changes tested?
test_binary_expr_display_with_parenthesescovering:(1+2)*31+(2*3)(a OR b) AND cAre these changes safe?
Yes. The
Displaychange only affects human-readable plan output (e.g.EXPLAINplans, debug logging).SchemaDisplay(used for field lookups and schema stability) is intentionally unchanged.SqlDisplaygains parentheses which makes generated SQL unambiguous.