Add MIN_BY and MAX_BY aggregations for groupby and reduction #20947
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements native
MIN_BYandMAX_BYaggregations to support Apache Spark's min_by/max_by operations. These aggregations return the value from one column at the row where another column has its minimum/maximum value, avoiding the slower struct-based min/max comparison path that forces sort-based groupby.Changes
Aggregation Infrastructure
MIN_BYandMAX_BYtoaggregation::Kindenummin_by_aggregationandmax_by_aggregationclasses with visitor pattern supportmake_min_by_aggregation()andmake_max_by_aggregation()Implementation
argmin/argmax+ gather patternargmin/argmaxto find index, then extracts value at that indexTests
min_by_tests.cppandmax_by_tests.cppcovering basic functionality and null handlingExample Usage
Files Modified
Original prompt
min_byandmax_byaggregations in reduction and groupby #20946✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.