Skip to content

Conversation

@dhawkins1234
Copy link
Contributor

@dhawkins1234 dhawkins1234 commented Dec 13, 2025

Outer join marks, such as in

sql = "select a.foo, b.bar, a.baz from a, b where a.baz = b.baz(+)"

are not supported by a majority of languages, the exception being Oracle and Exasol. Current behavior simply strips them out, which effectively converts the LEFT JOIN to an INNER JOIN.

>>> print(sqlglot.transpile(sql, read="oracle", write="snowflake"))
Outer join syntax using the (+) operator is not supported.
['SELECT a.foo, b.bar, a.baz FROM a, b WHERE a.baz = b.baz']

These two commits add the existing transforms.eliminate_join_markers as a preprocessing transform for most languages, or in the case of Exasol, fixed a bug where SUPPORTS_COLUMN_JOIN_MARKS was overridden to false because the (+) was not in the list of KEYWORDS.

With the change, the affected languages return

['SELECT a.foo, b.bar, a.baz FROM a LEFT JOIN b ON a.baz = b.baz']

but Exasol and Oracle return the original syntax.

I did not include ['clickhouse','duckdb','prql','solr'], either because there were tests giving warning messages (although they passed), or because the dialect had no Generator class, and I wanted to avoid larger structural changes.

@tobymao
Copy link
Owner

tobymao commented Dec 13, 2025

this looks pretty expensive, eliminate_join_marks does an entire traverse_scope, i don't want this to be done on default for every single dialect when translating from oracle is very niche behavior.

can we instead remove join marks at parse time? and avoid the overhead in the other languages?

@dhawkins1234
Copy link
Contributor Author

dhawkins1234 commented Dec 13, 2025

That's fine for my use case, but in the event that someone wants to parse but not mutate the AST, there would need to be some logic around defaults. Not super familiar with the codebase and principles yet, but maybe make a flag preserve_join_marks = true for parsing, but set it to false for transpiling by default, and raise an Exception if trying to write an AST with join marks to an unsupported language?

It should break though, and provide guidance on what to do, not continue to parse. I submitted this because I was parsing some legacy Oracle SQL into Snowflake and it wasn't until I happened to be examining a query that had a join marker that I noticed the issue.

@dhawkins1234
Copy link
Contributor Author

Alternatively, just make preserve_join_marks false by default across the board, as it's recommended against by Oracle anyway (although apparently there are still differences that are likely bugs.) It doesn't matter to me, but what should the default behavior be?

Remove the explicit SUPPORTS_COLUMN_JOIN_MARKS constant and add '(+)' to
KEYWORDS instead. The flag is determined dynamically during dialect
registration in dialects.py by checking if KEYWORDS contains '(+)'. If
not present, it overrides to False. The constant in exasol.py wasn't actually
setting the flag to True.
Automatically apply eliminate_join_marks transform when generating SQL
for dialects that don't support column join marks. This ensures Oracle
join marker syntax (e.g., ) is properly converted
to standard LEFT JOIN syntax instead of being silently dropped or causing
errors.

Changes:
- Add automatic eliminate_join_marks application in Generator.preprocess()
  when dialect doesn't support column join marks
- Optimize eliminate_join_marks to return early if no join marks are
  present, avoiding unnecessary traversal and normalization operations
- Update test expectations to reflect LEFT JOIN conversion behavior
- Add test coverage for exasol dialect join marker handling
- Remove test expecting UnsupportedError since transform now handles
  conversion automatically

The early exit optimization in eliminate_join_marks improves performance
by skipping expensive scope traversal and normalization when no join marks
exist in the expression tree.
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dhawkins1234, thanks for the contribution, but I don't think we're going to merge this PR as is– please refer to the comment I left. The only change I see merging is the Exasol one (nice catch, btw).

Comment on lines +836 to +839
if not self.dialect.SUPPORTS_COLUMN_JOIN_MARKS:
from sqlglot.transforms import eliminate_join_marks

expression = eliminate_join_marks(expression)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the original PR submission, but I believe this is what you were referring to @tobymao?

Let's definitely not do this. This transformation wasn't ever intended to be applied to other dialects, but rather to be there in case some folks want to manually use it.

Join marks are a pretty niche feature and it's out of scope to support transpiling it by default for every other dialect. The reason, as Toby said, is mostly that it's costly.

Comment on lines +873 to +879
# Early exit if no join marks exist
if not any(
c.args.get("join_mark")
for where in expression.find_all(exp.Where)
for c in where.find_all(exp.Column)
):
return expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to the PR's original scope and I'm unsure whether it's worth having without benchmarking, as it also incurs a full AST walk.

@georgesittas
Copy link
Collaborator

I'm closing this for the time being. Feel free to submit a separate PR for the Exasol patch and can discuss the AST walk in Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants