-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(dialects): convert Oracle-style (+) join markers to standard OUTER JOINs #6552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
this looks pretty expensive, eliminate_join_marks does an entire traverse_scope, i don't want this to be done on default for every single dialect when translating from oracle is very niche behavior. can we instead remove join marks at parse time? and avoid the overhead in the other languages? |
|
That's fine for my use case, but in the event that someone wants to parse but not mutate the AST, there would need to be some logic around defaults. Not super familiar with the codebase and principles yet, but maybe make a flag preserve_join_marks = true for parsing, but set it to false for transpiling by default, and raise an Exception if trying to write an AST with join marks to an unsupported language? It should break though, and provide guidance on what to do, not continue to parse. I submitted this because I was parsing some legacy Oracle SQL into Snowflake and it wasn't until I happened to be examining a query that had a join marker that I noticed the issue. |
|
Alternatively, just make preserve_join_marks false by default across the board, as it's recommended against by Oracle anyway (although apparently there are still differences that are likely bugs.) It doesn't matter to me, but what should the default behavior be? |
Remove the explicit SUPPORTS_COLUMN_JOIN_MARKS constant and add '(+)' to KEYWORDS instead. The flag is determined dynamically during dialect registration in dialects.py by checking if KEYWORDS contains '(+)'. If not present, it overrides to False. The constant in exasol.py wasn't actually setting the flag to True.
Automatically apply eliminate_join_marks transform when generating SQL for dialects that don't support column join marks. This ensures Oracle join marker syntax (e.g., ) is properly converted to standard LEFT JOIN syntax instead of being silently dropped or causing errors. Changes: - Add automatic eliminate_join_marks application in Generator.preprocess() when dialect doesn't support column join marks - Optimize eliminate_join_marks to return early if no join marks are present, avoiding unnecessary traversal and normalization operations - Update test expectations to reflect LEFT JOIN conversion behavior - Add test coverage for exasol dialect join marker handling - Remove test expecting UnsupportedError since transform now handles conversion automatically The early exit optimization in eliminate_join_marks improves performance by skipping expensive scope traversal and normalization when no join marks exist in the expression tree.
b8893fa to
7cfd29f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dhawkins1234, thanks for the contribution, but I don't think we're going to merge this PR as is– please refer to the comment I left. The only change I see merging is the Exasol one (nice catch, btw).
| if not self.dialect.SUPPORTS_COLUMN_JOIN_MARKS: | ||
| from sqlglot.transforms import eliminate_join_marks | ||
|
|
||
| expression = eliminate_join_marks(expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check the original PR submission, but I believe this is what you were referring to @tobymao?
Let's definitely not do this. This transformation wasn't ever intended to be applied to other dialects, but rather to be there in case some folks want to manually use it.
Join marks are a pretty niche feature and it's out of scope to support transpiling it by default for every other dialect. The reason, as Toby said, is mostly that it's costly.
| # Early exit if no join marks exist | ||
| if not any( | ||
| c.args.get("join_mark") | ||
| for where in expression.find_all(exp.Where) | ||
| for c in where.find_all(exp.Column) | ||
| ): | ||
| return expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unrelated to the PR's original scope and I'm unsure whether it's worth having without benchmarking, as it also incurs a full AST walk.
|
I'm closing this for the time being. Feel free to submit a separate PR for the Exasol patch and can discuss the AST walk in Slack. |
Outer join marks, such as in
are not supported by a majority of languages, the exception being Oracle and Exasol. Current behavior simply strips them out, which effectively converts the LEFT JOIN to an INNER JOIN.
These two commits add the existing
transforms.eliminate_join_markersas a preprocessing transform for most languages, or in the case of Exasol, fixed a bug where SUPPORTS_COLUMN_JOIN_MARKS was overridden to false because the (+) was not in the list of KEYWORDS.With the change, the affected languages return
but Exasol and Oracle return the original syntax.
I did not include
['clickhouse','duckdb','prql','solr'],either because there were tests giving warning messages (although they passed), or because the dialect had no Generator class, and I wanted to avoid larger structural changes.