Skip to content

chore: Mark expressions with known correctness issues as incompatible#3675

Merged
andygrove merged 8 commits intoapache:mainfrom
andygrove:mark-incompatible-expressions
Mar 12, 2026
Merged

chore: Mark expressions with known correctness issues as incompatible#3675
andygrove merged 8 commits intoapache:mainfrom
andygrove:mark-incompatible-expressions

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Related to: #3645, #3644, #3346, #3332, #3330, #3180, #3173, #3016, #2649, #2646, #1897, #1729, #1630

Rationale for this change

Several expressions are currently marked as Compatible (Spark-compatible) but have open correctness issues that can produce incorrect results. These expressions should fall back to Spark by default to prevent silent data corruption, and can be explicitly enabled by users who understand the trade-offs via allowIncompatible=true.

What changes are included in this PR?

Expressions marked as Incompatible (9 expressions across 6 serde files):

Expression Issue(s) Condition
ArrayContains #3346 Always (empty array with literal)
GetArrayItem #3330, #3332 Always (index handling bugs)
ArrayRemove #3173 Always (null element removal)
Hour, Minute, Second #3180 Only for TimestampNTZ inputs
TruncTimestamp #2649 Only for non-UTC timezones
Ceil, Floor #1729 Only for Decimal type inputs
Tan #1897 Always (negative zero)
Corr #2646 Always (null vs NaN)
StructsToJson #3016 Always (Infinity values)

Where possible, the incompatibility is conditional on the specific input type that triggers the bug (e.g., Hour/Minute/Second are only incompatible for TimestampNTZ, Ceil/Floor only for Decimal, TruncTimestamp only for non-UTC timezones).

Documentation updates:

  • expressions.md: Updated Spark-Compatible status from "Yes" to "No" for all affected expressions, with compatibility notes linking to tracking issues
  • compatibility.md: Added detailed "Incompatible Expressions" subsections organized by category (Array, Date/Time, Math, Aggregate, Struct) with descriptions and issue links

How are these changes tested?

These changes only add getSupportLevel overrides (which cause expressions to fall back to Spark by default) and update documentation. The existing test suite covers the fallback mechanism. No new behavioral logic is introduced.

Review all open correctness issues and mark affected expressions as
Incompatible so they fall back to Spark by default. Update the
compatibility guide with detailed documentation of each incompatibility
and links to tracking issues.

Expressions marked Incompatible:
- ArrayContains (apache#3346), GetArrayItem (apache#3330, apache#3332), ArrayRemove (apache#3173)
- Hour, Minute, Second for TimestampNTZ inputs (apache#3180)
- TruncTimestamp for non-UTC timezones (apache#2649)
- Ceil, Floor for Decimal inputs (apache#1729)
- Tan (apache#1897), Corr (apache#2646), StructsToJson (apache#3016)
@andygrove andygrove changed the title Mark expressions with known correctness issues as incompatible chore: Mark expressions with known correctness issues as incompatible Mar 12, 2026
@andygrove andygrove marked this pull request as draft March 12, 2026 15:28
@andygrove
Copy link
Member Author

I am working on updating some tests to enable allowIncomp for these expressions

@andygrove andygrove added this to the 0.14.0 milestone Mar 12, 2026
@andygrove andygrove marked this pull request as ready for review March 12, 2026 19:02
Copy link
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove!

@comphead
Copy link
Contributor

Thanks @andygrove maybe we can add some SKILLS.md that check if PR related to builtin functions then also check if compat flag can be removed 🤔

@andygrove
Copy link
Member Author

Thanks @andygrove maybe we can add some SKILLS.md that check if PR related to builtin functions then also check if compat flag can be removed 🤔

Personally, I am very much in favor of adding skills to help with code reviews, documentation audit, release process, etc. I am not sure how others feel about it though.

@andygrove andygrove merged commit f49e4b6 into apache:main Mar 12, 2026
113 checks passed
@andygrove andygrove deleted the mark-incompatible-expressions branch March 12, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants