Skip to content

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Feb 1, 2026

Motivation

  • Provide a config toggle to enable or disable dimension-table upsert/dedup logic so clusters can opt into queryable-doc-id filtering and upsert behavior for dimension tables.
  • Ensure upsert-related processing (computing/applying per-segment queryable doc id bitmaps and enabling segment upsert state) is only performed when the feature is explicitly enabled.

Description

  • Added an enableUpsert boolean to DimensionTableConfig (JSON property enableUpsert) and exposed isUpsertEnabled() in pinot-spi.
  • Read the new flag in DimensionTableDataManager and gate upsert-related logic behind _enableUpsert, including using queryable-doc-id snapshots when sizing/iterating segments and applying per-segment bitmaps.
  • Introduced a small RecordLocation type and helper methods applyQueryableDocIdsForRecordLocations, applyQueryableDocIdsForLookupTable, applyQueryableDocIdsToSegments, and getQueryableDocIdsSnapshot in DimensionTableDataManager to compute and apply per-segment MutableRoaringBitmap sets and call ImmutableSegmentImpl.enableUpsert(...) when appropriate.
  • Updated all test and helper call sites that construct DimensionTableConfig to pass the new flag, and added integration coverage that creates a small OFFLINE upsert dimension table and asserts deduplicated selection/count results (testDimensionTableUpsertSelection), as well as a unit test testLookupRespectsQueryableDocIds that verifies lookup respects queryable doc ids when upsert is enabled.

Testing

  • No automated test suites (mvn/CI) were executed as part of this change.
  • Added/updated tests include MultiStageEngineIntegrationTest.testDimensionTableUpsertSelection (integration) and DimensionTableDataManagerTest.testLookupRespectsQueryableDocIds (unit), but these tests were added and not run in this rollout.
  • Existing test usages and benchmark helpers were updated to construct the new config parameter where needed and compile-time imports were adjusted accordingly.

Validation

Quickstart dim table dimBaseballTeams has 3 identical segments, and upsert is enabled.
Default query with upsert enabled:
image
SkipUpsert will show 3 times rows:
image


Codex Task

@xiangfu0 xiangfu0 force-pushed the codex/add-delete-vector-for-upsert-feature branch from 0faf6e1 to 06edb73 Compare February 2, 2026 12:09
@codecov-commenter
Copy link

codecov-commenter commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 32.35294% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.15%. Comparing base (7305eec) to head (8cd4ac9).
⚠️ Report is 21 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ata/manager/offline/DimensionTableDataManager.java 32.06% 79 Missing and 10 partials ⚠️
...e/pinot/spi/config/table/DimensionTableConfig.java 40.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17606      +/-   ##
============================================
- Coverage     63.16%   63.15%   -0.02%     
- Complexity     1479     1500      +21     
============================================
  Files          3173     3174       +1     
  Lines        189917   190366     +449     
  Branches      29064    29096      +32     
============================================
+ Hits         119970   120234     +264     
- Misses        60621    60796     +175     
- Partials       9326     9336      +10     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.14% <32.35%> (+<0.01%) ⬆️
java-21 63.11% <32.35%> (-0.01%) ⬇️
temurin 63.15% <32.35%> (-0.02%) ⬇️
unittests 63.15% <32.35%> (-0.02%) ⬇️
unittests1 55.59% <32.35%> (+0.06%) ⬆️
unittests2 33.96% <0.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configuration support for enabling upsert/deduplication logic on dimension tables. The DimensionTableConfig now includes an enableUpsert boolean flag that gates upsert-related processing, including queryable doc ID filtering and per-segment bitmap management. When enabled, the dimension table manager computes and applies queryable doc ID snapshots to ensure only the latest records for each primary key are visible to queries.

Changes:

  • Added enableUpsert configuration field to DimensionTableConfig with JSON property and alias support
  • Implemented queryable doc ID filtering in DimensionTableDataManager to support dimension table upsert behavior
  • Updated all test and benchmark call sites to include the new configuration parameter

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pinot-spi/src/main/java/org/apache/pinot/spi/utils/NetUtils.java Improved network fallback handling with better exception catching and loopback address fallback
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/DimensionTableConfig.java Added enableUpsert field with JSON serialization support
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkDimensionTableOverhead.java Updated benchmark to pass new enableUpsert parameter (false)
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java Added integration test for dimension table upsert and utility method for DataSketches version checking
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/DimensionTableIntegrationTest.java Updated test to include new enableUpsert parameter (false)
pinot-core/src/test/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManagerTest.java Added unit test for queryable doc ID filtering and updated existing tests with new parameter
pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java Implemented upsert logic with queryable doc ID management and per-segment bitmap application

@xiangfu0 xiangfu0 force-pushed the codex/add-delete-vector-for-upsert-feature branch 5 times, most recently from 03d84ca to 4d92e78 Compare February 3, 2026 05:08
@xiangfu0 xiangfu0 force-pushed the codex/add-delete-vector-for-upsert-feature branch from 4d92e78 to bf05350 Compare February 3, 2026 05:36
@xiangfu0 xiangfu0 requested a review from Copilot February 4, 2026 14:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

@xiangfu0 xiangfu0 requested a review from Copilot February 5, 2026 11:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

@xiangfu0 xiangfu0 force-pushed the codex/add-delete-vector-for-upsert-feature branch from bf05350 to 7a87c3a Compare February 5, 2026 12:06
@xiangfu0 xiangfu0 requested a review from Copilot February 5, 2026 12:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

@xiangfu0 xiangfu0 force-pushed the codex/add-delete-vector-for-upsert-feature branch from 7a87c3a to 8cd4ac9 Compare February 5, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants