-
Notifications
You must be signed in to change notification settings - Fork 128
Fix Scalar de/serialization with proto and outline nulls from ScalarValue
#6309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
04d214e to
a321ca9
Compare
ScalarValue and cast on deserialize
ScalarValue and cast on deserializeScalarValue and fix de/serialization with Proto
f352246 to
dded7b4
Compare
gatesn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments on the main Scalar API.
0ee15dc to
cd8ea9d
Compare
ScalarValue and fix de/serialization with ProtoScalarValue and fix de/serialization with Proto
cd8ea9d to
323dfb5
Compare
Merging this PR will degrade performance by 15.87%
Performance Changes
Comparing Footnotes
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: PolarSignals ProfilingSummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
|
Github's web UI broke while reviewing this PR, I'll pick it up later today/early tomorrow |
c08ca56 to
041a192
Compare
|
looks like a semantic merge conflict with #6388 |
062a387 to
679b3f6
Compare
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
679b3f6 to
5411fde
Compare
|
yolo If anyone runs into issues with scalars, please ping me and Ill take a look at it ASAP. There's known inconsistencies in this PR (though I think I ironed out most of the bugs) |
This PR is generally just a refactoring of the
vortex-scalarcrate and all of its dependents.Currently, we store an opaque
InnerScalarValueinside aScalarValue, and theInnerScalarValueis allowed to beNull.This PR both removes outlines the
InnerScalarValue::Nullvariant, where a nullable scalar is now anOption<ScalarValue>(instead of just aScalarValue. This also completely removesInnerScalarValuein favor of a publicScalarValueenum:(IMPORTANT CHANGE) Additionally, all
Scalars are verified to be sound on construction by checking that theDTypeof theScalaris_compatiblewith theOption<&ScalarValue>that is passed.The stricter construction changes then mean that we need to change how deserialization of scalars work. The protobuf format is not exactly 1-1 with our
Scalars (notably, it only supports 64-bit integers). This means that we might write 8-bit integers and the round trip returns a 64-bit integer. So this PR also changes some APIs to allow us to pass aDTypeto the statistics deserializers. TBD on if this needs to happen in more places (FileStatistics?).For reviewers: try to look over all of the diffs since a large majority is not just renaming variables, they are semantic changes that I am not super confidant is right.
Breaks
Breaks the old
file_statsmethod onVortexFileto returnFileStatisticsinstead of the array ofStatsSet. We needed to do this in order to correctly read statistics from DataFusion, specifically in the case of schema evolution.TODO
Some benchmarks are failing, and also I still need to review everything myself to make sure everything is correct. I also want to add more tests in certain places where I'm very scared things are wrong.