Use BooleanBufferBuilder rather than Vec<bool> in ArrowBytesViewMap#20064
Conversation
|
Thanks for the review! |
|
run benchmarks |
|
🤖 |
|
Thanks for the review and suggestions! |
Co-authored-by: Daniël Heres <danielheres@gmail.com>
|
🤖: Benchmark completed Details
|
alamb
left a comment
There was a problem hiding this comment.
Thanks @etk18 and @Dandandan -- this looks cool -- Using NullBufferBuilder will be even better. Thank you
|
Thanks for the reviews! I’ve updated the PR to use Happy to rebase or make further adjustments if needed. |
|
Thanks for the review! I’ve pushed updates to:
CI is rerunning now. Please let me know if anything else is needed. |
|
run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Thank you @etk18 ! |
| } else { | ||
| None | ||
| }; | ||
| let null_buffer = self.nulls.finish(); |
| let in_progress_size = self.in_progress.capacity(); | ||
| let completed_size: usize = self.completed.iter().map(|b| b.len()).sum(); | ||
| let nulls_size = self.nulls.len(); | ||
| let nulls_size = self.nulls.len() / 8; |
There was a problem hiding this comment.
I think this should use https://docs.rs/arrow/latest/arrow/array/struct.NullBufferBuilder.html#method.allocated_size as it is meant to track the allocated size (not the used size)
There was a problem hiding this comment.
Which issue does this PR close?
Closes #20053
Rationale for this change
ArrowBytesViewMappreviously usedVec<bool>to track null values.This PR replaces it with
BooleanBufferBuilder, which is significantly morememory-efficient and faster, and aligns with Apache Arrow best practices for
building validity bitmaps.
This change improves performance and memory usage without changing behavior.
What changed
Vec<bool>withBooleanBufferBuilderfor null trackingTests
cargo test -p datafusion-physical-expr-commonScreen.Recording.2026-01-29.at.8.34.41.PM.mov