Skip to content

[SPARK-54731][SQL] Fix Reverse expression to handle BinaryType by performing raw byte-level reversal#55051

Open
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-54731
Open

[SPARK-54731][SQL] Fix Reverse expression to handle BinaryType by performing raw byte-level reversal#55051
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-54731

Conversation

@xiaoxuandev
Copy link
Contributor

What changes were proposed in this pull request?

When Spark's reverse function is called on a BinaryType column, it previously threw a type mismatch error because BinaryType was not in the accepted input types. This patch adds BinaryType support to the Reverse expression.

The changes:

  1. Add BinaryType to the inputTypes TypeCollection in Reverse.
  2. Add a BinaryType match case in the interpreted execution path (doReverse) using Scala's Array.reverse.
  3. Add a BinaryType match case in the code generation path (doGenCode) with a dedicated binaryCodeGen method that performs byte-level reversal via a for loop.
  4. Update @ExpressionDescription to document binary support (since 4.2.0).
  5. Update data type mismatch error test expectations to include BINARY in the accepted types.
  6. Add end-to-end SQL test for reverse on BinaryType in DataFrameFunctionsSuite covering both interpreted and codegen paths.

Why are the changes needed?

Without this fix, calling reverse on a BinaryType column fails with a type error. This is inconsistent with other Spark functions like length and concat that already support BinaryType.

Does this PR introduce any user-facing change?

Yes. The reverse function now accepts BinaryType input and returns the byte-reversed binary value. Previously this would throw an AnalysisException.

How was this patch tested?

  • Unit tests in CollectionExpressionsSuite: multi-byte reversal, empty byte array, single byte, byte array containing 0x00, null input.
  • End-to-end test in DataFrameFunctionsSuite: reverse on BinaryType column through both interpreted and codegen (cached) paths.
  • Updated data type mismatch tests to reflect the new accepted type set.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

…forming raw byte-level reversal

### What changes were proposed in this pull request?
When Spark's `reverse` function is called on a BinaryType column, it previously threw a type mismatch error because BinaryType was not in the accepted input types. This patch adds BinaryType support to the Reverse expression.

The changes:
1. Add BinaryType to the inputTypes TypeCollection in Reverse.
2. Add a BinaryType match case in the interpreted execution path (doReverse) using Scala's Array.reverse.
3. Add a BinaryType match case in the code generation path (doGenCode) with a dedicated binaryCodeGen method that performs byte-level reversal via a for loop.
4. Update @ExpressionDescription to document binary support (since 4.2.0), fix parameter name from `array` to `expr`.
5. Update data type mismatch error test expectations to include BINARY in the accepted types.
6. Add end-to-end SQL test for reverse on BinaryType in DataFrameFunctionsSuite covering both interpreted and codegen paths.

### Why are the changes needed?
Without this fix, calling `reverse` on a BinaryType column fails with a type error. This is inconsistent with other Spark functions like `length` and `concat` that already support BinaryType.

### Does this PR introduce _any_ user-facing change?
Yes. The `reverse` function now accepts BinaryType input and returns the byte-reversed binary value. Previously this would throw an AnalysisException.

### How was this patch tested?
- Unit tests in CollectionExpressionsSuite: multi-byte reversal, empty byte array, single byte, byte array containing 0x00, null input.
- End-to-end test in DataFrameFunctionsSuite: reverse on BinaryType column through both interpreted and codegen (cached) paths.
- Updated data type mismatch tests to reflect the new accepted type set.
- Scalastyle and ExpressionsSchemaSuite pass.

### Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored with Kiro.
@HyukjinKwon
Copy link
Member

Are there any reference that supports binary with reverse?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants