-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add config for dimension table upsert #17606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add config for dimension table upsert #17606
Conversation
0faf6e1 to
06edb73
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17606 +/- ##
============================================
- Coverage 63.16% 63.15% -0.02%
- Complexity 1479 1500 +21
============================================
Files 3173 3174 +1
Lines 189917 190366 +449
Branches 29064 29096 +32
============================================
+ Hits 119970 120234 +264
- Misses 60621 60796 +175
- Partials 9326 9336 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds configuration support for enabling upsert/deduplication logic on dimension tables. The DimensionTableConfig now includes an enableUpsert boolean flag that gates upsert-related processing, including queryable doc ID filtering and per-segment bitmap management. When enabled, the dimension table manager computes and applies queryable doc ID snapshots to ensure only the latest records for each primary key are visible to queries.
Changes:
- Added
enableUpsertconfiguration field toDimensionTableConfigwith JSON property and alias support - Implemented queryable doc ID filtering in
DimensionTableDataManagerto support dimension table upsert behavior - Updated all test and benchmark call sites to include the new configuration parameter
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/NetUtils.java | Improved network fallback handling with better exception catching and loopback address fallback |
| pinot-spi/src/main/java/org/apache/pinot/spi/config/table/DimensionTableConfig.java | Added enableUpsert field with JSON serialization support |
| pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkDimensionTableOverhead.java | Updated benchmark to pass new enableUpsert parameter (false) |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java | Added integration test for dimension table upsert and utility method for DataSketches version checking |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/DimensionTableIntegrationTest.java | Updated test to include new enableUpsert parameter (false) |
| pinot-core/src/test/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManagerTest.java | Added unit test for queryable doc ID filtering and updated existing tests with new parameter |
| pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java | Implemented upsert logic with queryable doc ID management and per-segment bitmap application |
pinot-spi/src/main/java/org/apache/pinot/spi/utils/NetUtils.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/NetUtils.java
Outdated
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java
Outdated
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
03d84ca to
4d92e78
Compare
4d92e78 to
bf05350
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/DimensionTableConfig.java
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
bf05350 to
7a87c3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Show resolved
Hide resolved
7a87c3a to
8cd4ac9
Compare
Motivation
Description
enableUpsertboolean toDimensionTableConfig(JSON propertyenableUpsert) and exposedisUpsertEnabled()inpinot-spi.DimensionTableDataManagerand gate upsert-related logic behind_enableUpsert, including using queryable-doc-id snapshots when sizing/iterating segments and applying per-segment bitmaps.RecordLocationtype and helper methodsapplyQueryableDocIdsForRecordLocations,applyQueryableDocIdsForLookupTable,applyQueryableDocIdsToSegments, andgetQueryableDocIdsSnapshotinDimensionTableDataManagerto compute and apply per-segmentMutableRoaringBitmapsets and callImmutableSegmentImpl.enableUpsert(...)when appropriate.DimensionTableConfigto pass the new flag, and added integration coverage that creates a small OFFLINE upsert dimension table and asserts deduplicated selection/count results (testDimensionTableUpsertSelection), as well as a unit testtestLookupRespectsQueryableDocIdsthat verifies lookup respects queryable doc ids when upsert is enabled.Testing
mvn/CI) were executed as part of this change.MultiStageEngineIntegrationTest.testDimensionTableUpsertSelection(integration) andDimensionTableDataManagerTest.testLookupRespectsQueryableDocIds(unit), but these tests were added and not run in this rollout.Validation
Quickstart dim table


dimBaseballTeamshas 3 identical segments, and upsert is enabled.Default query with upsert enabled:
SkipUpsert will show 3 times rows:
Codex Task