[branch-52] Fix Arrow Spill Underrun (#20159) by hareshkh · Pull Request #20684 · apache/datafusion

hareshkh · 2026-03-03T21:59:42Z

Which issue does this PR close?

Related to Release DataFusion 52.3.0 (minor/) Release (Mar 2026) #20681
Backport of Fix Arrow Spill Underrun #20159

Rationale for this change

This adjusts the way that the spill channel works. Currently we have a spill writer & reader pairing which uses a mutex to coordindate when a file is ready to be read.

What happens is, that because we were using a spawn_buffered call, the read task would race ahead trying to read a file which is yet to be written out completely.

Alongside this, we need to flush each write to the file, as there is a chance that another thread may see stale data.

What changes are included in this PR?

Adds a flush on write, and converts the read task to not buffer reads.

Are these changes tested?

I haven't written a test, but I have been running the example in the attached issue. While it now fails with allocation errors, the original error goes away.

Are there any user-facing changes?

Nope

## Which issue does this PR close? - Closes apache#19425 ## Rationale for this change This adjusts the way that the spill channel works. Currently we have a spill writer & reader pairing which uses a mutex to coordindate when a file is ready to be read. What happens is, that because we were using a `spawn_buffered` call, the read task would race ahead trying to read a file which is yet to be written out completely. Alongside this, we need to flush each write to the file, as there is a chance that another thread may see stale data. ## What changes are included in this PR? Adds a flush on write, and converts the read task to not buffer reads. ## Are these changes tested? I haven't written a test, but I have been running the example in the attached issue. While it now fails with allocation errors, the original error goes away. ## Are there any user-facing changes? Nope

hareshkh · 2026-03-04T18:18:40Z

@alamb @adriangb : Can you please take a look at this? (You reviewed the original PR)

adriangb · 2026-03-04T18:28:34Z

Could you link in the comment description (so it ends up in the commit) to the original PR, and how you created this diff (e.g. "cherry picked abc123")?

alamb

Looks good to me -- thank you @hareshkh and @adriangb

I linked the PR that is being backported and I verified the diff is the same, and removed the duplicated template in the PR description

I also ran the problem query and verified that that before this PR , it fails like this:

andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ ~/Software/datafusion-cli/datafusion-cli-52.2.0  -m 1G -c "SELECT \"UserID\", extract(minute FROM to_timestamp_seconds(\"EventTime\")) AS m, \"SearchPhrase\", COUNT(*) FROM '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned' GROUP BY \"UserID\", m, \"SearchPhrase\" ORDER BY COUNT(*) DESC LIMIT 10;"
DataFusion CLI v52.2.0
Error: Arrow error: Io error: failed to fill whole buffer

And after this PR it succeeds

andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ ./target/profiling/datafusion-cli  -m 1G -c "SELECT \"UserID\", extract(minute FROM to_timestamp_seconds(\"EventTime\")) AS m, \"SearchPhrase\", COUNT(*) FROM '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned' GROUP BY \"UserID\", m, \"SearchPhrase\" ORDER BY COUNT(*) DESC LIMIT 10;"
DataFusion CLI v52.2.0
+---------------------+----+--------------+----------+
| UserID              | m  | SearchPhrase | count(*) |
+---------------------+----+--------------+----------+
| 1313338681122956954 | 31 |              | 589      |
| 1313338681122956954 | 28 |              | 578      |
| 1313338681122956954 | 29 |              | 572      |
| 1313338681122956954 | 33 |              | 567      |
| 1313338681122956954 | 27 |              | 557      |
| 1313338681122956954 | 32 |              | 554      |
| 1313338681122956954 | 30 |              | 552      |
| 1313338681122956954 | 34 |              | 546      |
| 1313338681122956954 | 26 |              | 540      |
| 1313338681122956954 | 10 |              | 539      |
+---------------------+----+--------------+----------+
10 row(s) fetched.
Elapsed 184.374 seconds.

alamb · 2026-03-04T20:13:40Z

Thanks again @hareshkh

github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 3, 2026

hareshkh mentioned this pull request Mar 3, 2026

Release DataFusion 52.3.0 (minor/) Release (Mar 2026) #20681

Open

12 tasks

hareshkh changed the title ~~Fix Arrow Spill Underrun (#20159)~~ [branch-52] Fix Arrow Spill Underrun (#20159) Mar 4, 2026

alamb approved these changes Mar 4, 2026

View reviewed changes

alamb merged commit 9a67de5 into apache:branch-52 Mar 4, 2026
38 of 39 checks passed

hareshkh deleted the repartition_spill_buffering branch March 4, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[branch-52] Fix Arrow Spill Underrun (#20159)#20684

[branch-52] Fix Arrow Spill Underrun (#20159)#20684
alamb merged 1 commit intoapache:branch-52from
hareshkh:repartition_spill_buffering

hareshkh commented Mar 3, 2026 •

edited by alamb

Loading

Uh oh!

hareshkh commented Mar 4, 2026

Uh oh!

adriangb commented Mar 4, 2026

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

alamb commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hareshkh commented Mar 3, 2026 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

hareshkh commented Mar 4, 2026

Uh oh!

adriangb commented Mar 4, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hareshkh commented Mar 3, 2026 •

edited by alamb

Loading