Skip to content

Subtraction overflow in max_distinct_count when hash join has a pushed-down limit #20779

@gabotechs

Description

@gabotechs

Describe the bug

max_distinct_count in datafusion/physical-plan/src/joins/utils.rs panics with "attempt to subtract with overflow" in the Precision::Exact branch (line 725):

Precision::Exact(count) => {
    let count = count - stats.null_count.get_value().unwrap_or(&0); // <-- panic

This happens when num_rows (Exact) is smaller than null_count, which became possible after #20228, which added fetch support to HashJoinExec. When a limit is pushed down, HashJoinExec::partition_statistics() calls stats.with_fetch(self.fetch, 0, 1), which reduces num_rows to Exact(fetch_value) but does not reduce null_count in column statistics.

Example failing pipeline:
https://github.com/datafusion-contrib/datafusion-distributed/actions/runs/22798285744/job/66136064932?pr=366

To Reproduce

git clone https://github.com/datafusion-contrib/datafusion-distributed
cd datafusion-distributed
git checkout branch-53
cargo test --test tpcds_plans_test tests::test_tpcds_19 --all-features

Expected behavior

No substraction overflow

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions