Improve visitor performance by reducing Vec and String allocations #6337
Improve visitor performance by reducing Vec and String allocations #6337
Vec and String allocations #6337Conversation
Benchmarks: Random AccessSummary
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Benchmarks: CompressionSummary
Detailed Results Table
|
Merging this PR will degrade performance by 13.38%
Performance Changes
Comparing Footnotes
|
c235d20 to
531a071
Compare
Benchmarks: PolarSignals ProfilingSummary
Detailed Results Table
|
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
0fda89b to
015b5d7
Compare
|
moving the filter piece into a different PR so I can try and measure a bigger thing |
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
1d9c33f to
8172fcb
Compare
|
The codspeed improvement here is real 🥳, like 20% of those benchmarks was just formatting the names of the chunks. |
|
@gatesn following our conversation on Friday, are you going to break these APIs and I should just close this PR? |
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Vec and String allocations
|
I will take a look. Very keen to take these performance wins, just want to make sure it moves in the direction we were going to avoid extra work. |
Using
dhat, identified these two code paths as heavy allocators, running the full tpch benchmark.This PR fills out many default implementations in
VisitorVTable, making them much more efficient for many cases, both by reducing recursive function calls and by not formatting names. As far as I can tell - many of the improvements shown by codspeed are completely real here.Using the existing iterator when filtering instead of allocating a Vec saves about 2GB of memory allocation (around 1.6% of all memory allocated), which happens in a tiny amount of allocations (0.02%).Moving to a separate PR, I want to try and make a bigger change and be able to measure it.nth_child(should just bechild?) was just another offender that was easy to experiment with, it doesn't allocate a lot of data but it does save ~0.6% of all allocations.