Fix sort merge interleave overflow by xudong963 · Pull Request #20922 · apache/datafusion

xudong963 · 2026-03-13T09:14:57Z

Which issue does this PR close?

Closes #.

Rationale for this change

When SortPreservingMergeStream merges batches containing large string/binary columns whose combined offsets exceed i32::MAX, Arrow's interleave panics with .expect("overflow"). This PR catches that panic and retries with progressively fewer rows, producing smaller output batches that fit within i32 offset limits.

What changes are included in this PR?

Are these changes tested?

Yes UT

Are there any user-facing changes?

xudong963 · 2026-03-16T06:37:09Z

~~I'll continue the PR after apache/arrow-rs#9549 gets into DataFusion~~

I can directly capture the panic then fix

kosiew

👋 @xudong963,
THanks for working on this.

kosiew · 2026-03-19T11:44:28Z

datafusion/physical-plan/src/sorts/builder.rs

+                match catch_unwind(AssertUnwindSafe(|| interleave(&arrays, indices))) {
+                    Ok(result) => Ok(result?),
+                    Err(panic_payload) => {
+                        if is_overflow_panic(&panic_payload) {


Catching any panic whose message merely contains "overflow" is too broad for a recovery path in the merge operator.

This now converts unrelated bugs such as Rust arithmetic overflows ("attempt to multiply with overflow") or allocation failures like "capacity overflow" into a synthetic OffsetOverflowError, causing the stream to silently split batches instead of surfacing the real defect.
Since this code is on the hot path and intentionally swallows panics, I think we need a tighter discriminator before merging. Ideally the overflow detection should match the specific Arrow panic we expect, or be isolated behind a smaller helper/API so we are not turning arbitrary panics into data-dependent control flow.

good point, totally agree

kosiew · 2026-03-19T11:48:28Z

datafusion/physical-plan/src/sorts/builder.rs

+    /// panic and retries with fewer rows until the output fits in i32
+    /// offsets.
+    #[test]
+    fn test_interleave_overflow_is_caught() {


this and test_sort_merge_fetch_interleave_overflow
allocate enormous strings (768 * 1024 * 1024 bytes each) and then materialize them into multiple StringArrays.

In practice that means several gigabytes of heap allocation per test, which is likely to make CI flaky or OOM outright.

The coverage is important, but I do not think these tests are better replaced with a lower-memory reproduction, for example by constructing the overflow condition with a purpose-built array fixture/helper instead of copying multi-GB payloads into StringArrays.

kosiew · 2026-03-19T11:50:13Z

datafusion/physical-plan/src/sorts/merge.rs

        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<RecordBatch>>> {
        if self.done {
+            // When `build_record_batch()` hits an i32 offset overflow (e.g.


The done branch and the normal emit path both repeat the same before = len(); build_record_batch(); produced += ... bookkeeping.

This feels like it wants a small helper on SortPreservingMergeStream or BatchBuilder so the overflow/drain behavior stays in one place.

kosiew · 2026-03-19T11:52:20Z

datafusion/physical-plan/src/sorts/builder.rs

-            } else {
-                self.batches_mem_used -= get_record_batch_memory_size(batch);
+        // Try interleaving all indices. On offset overflow, halve and retry.
+        let mut end = self.indices.len();


The retry loop is clear, but I think end is really "rows_to_emit".
Renaming that variable or extracting a helper like build_partial_record_batch would make the control flow a bit easier to scan now that build_record_batch has to coordinate retry, draining, and delayed cleanup.

xudong963 · 2026-03-20T06:06:15Z

@kosiew thanks for the review, 19fec33 addressed in the commit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

xudong963 · 2026-03-20T09:10:45Z

Some logs in our env:

thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{"timestamp":"2026-03-20T09:01:11.233319Z","level":"WARN","fields":{"message":"Interleave offset overflow with 2043 rows, retrying with 1021","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:15.856713Z","level":"WARN","fields":{"message":"Interleave offset overflow with 2043 rows, retrying with 510","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:18.259290Z","level":"WARN","fields":{"message":"Interleave offset overflow with 2043 rows, retrying with 255","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:25.359553Z","level":"WARN","fields":{"message":"Interleave offset overflow with 1788 rows, retrying with 894","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:29.011789Z","level":"WARN","fields":{"message":"Interleave offset overflow with 1788 rows, retrying with 447","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:31.266372Z","level":"WARN","fields":{"message":"Interleave offset overflow with 1788 rows, retrying with 223","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:37.146961Z","level":"WARN","fields":{"message":"Interleave offset overflow with 1565 rows, retrying with 782","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:39.659320Z","level":"WARN","fields":{"message":"Interleave offset overflow with 1565 rows, retrying with 391","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:45.441247Z","level":"WARN","fields":{"message":"Interleave offset overflow with 1174 rows, retrying with 587","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
thread 'main' (32) panicked at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.2.0/src/interleave.rs:180:41:
overflow
{"timestamp":"2026-03-20T09:01:47.648284Z","level":"WARN","fields":{"message":"Interleave offset overflow with 1174 rows, retrying with 293","log.target":"datafusion_physical_plan::sorts::builder","log.module_path":"datafusion_physical_plan::sorts::builder","log.file":"/usr/local/cargo/git/checkouts/arrow-datafusion-ea123ae062956126/8dcb444/datafusion/physical-plan/src/sorts/builder.rs","log.line":282},"target":"datafusion_physical_plan::sorts::builder"}
Finished writing metadata, stats, and bitmaps files. Metadata rows: 2043, Stats rows: 2043.

xudong963 · 2026-03-20T09:15:35Z

datafusion/physical-plan/src/sorts/builder.rs

+    // `.expect("overflow")` / `.expect("offset overflow")`.
+    // Catch only those specific panics so the caller can retry
+    // with fewer rows while unrelated defects still unwind.
+    match catch_unwind(AssertUnwindSafe(f)) {


This can be avoided after the apache/arrow-rs#9549 gets into DF

kosiew

@xudong963

Thanks for the update - this looks good overall, and I don’t have any blocking concerns. I left a couple of non-blocking suggestions around memory retention during partial draining and test coverage for the overflow retry path.

kosiew · 2026-03-20T10:10:47Z

datafusion/physical-plan/src/sorts/builder.rs

+        // Remove consumed indices, keeping any remaining for the next call.
+        self.indices.drain(..rows_to_emit);
+
+        // Only clean up fully-consumed batches when all indices are drained,


Nice change. One thing that stood out to me here: now that build_record_batch() can emit a prefix and leave the remainder buffered, this branch seems to keep every fully-consumed input batch alive until self.indices is empty.

That seems functionally correct, but it also means overflow cases could retain quite a bit of memory across several follow-up polls - especially for FETCH-limited queries where we stop pulling new input and just drain leftovers.

Would it make sense to either release batches that are no longer referenced by the remaining indices, or at least leave a quick comment here calling out that this retention is intentional? I think that would help future readers understand the tradeoff.

kosiew · 2026-03-20T10:10:47Z

datafusion/physical-plan/src/sorts/builder.rs

+    DataFusionError::ArrowError(Box::new(ArrowError::OffsetOverflowError(0)), None)
+}
+
+fn recover_offset_overflow_from_panic<T, F>(f: F) -> Result<T>


The retry behavior looks good, but right now it seems like it’s only covered through synthetic helper failures.

Since the production path depends on matching Arrow’s panic payload pretty closely, I think it’d be great to add one higher-level regression test closer to BatchBuilder::build_record_batch() or SortPreservingMergeStream that exercises the retry/drain flow end-to-end through an injectable interleave hook.

That would make it a lot easier to catch future Arrow-side panic-message changes - or refactors in this file before they slip through.

xudong963 marked this pull request as draft March 13, 2026 09:15

github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 13, 2026

xudong963 mentioned this pull request Mar 18, 2026

fix(topk): avoid overflow panic in interleave emission #20494

Open

xudong963 added 5 commits March 19, 2026 16:37

Fix sort merge interleave overflow

3da2ff8

add tests

61455f6

add log

714a5a9

fix

57d5120

resolve conflicts

5ab4676

xudong963 force-pushed the fix_overflow branch from a03bdca to 5ab4676 Compare March 19, 2026 09:07

fix

f62bdee

xudong963 marked this pull request as ready for review March 19, 2026 09:39

xudong963 requested review from 2010YOUY01 and kosiew March 19, 2026 09:40

kosiew requested changes Mar 19, 2026

View reviewed changes

address comments

19fec33

xudong963 added a commit to massive-com/arrow-datafusion that referenced this pull request Mar 20, 2026

Cherry-pick: Fix sort merge interleave overflow (apache#20922)

21ae9c9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

xudong963 added a commit to massive-com/arrow-datafusion that referenced this pull request Mar 20, 2026

Cherry-pick: Fix sort merge interleave overflow (apache#20922)

8dcb444

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

xudong963 commented Mar 20, 2026

View reviewed changes

kosiew reviewed Mar 20, 2026

View reviewed changes

kosiew approved these changes Mar 20, 2026

View reviewed changes

Conversation

xudong963 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

xudong963 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 commented Mar 20, 2026

Uh oh!

xudong963 commented Mar 20, 2026

Uh oh!

xudong963 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xudong963 commented Mar 13, 2026 •

edited

Loading

xudong963 commented Mar 16, 2026 •

edited

Loading

kosiew left a comment •

edited

Loading

kosiew Mar 20, 2026 •

edited

Loading

kosiew Mar 20, 2026 •

edited

Loading