Skip to content

Comments

HIVE-29420: Extra test coverage for Cleaner with minOpenWriteId flag#6286

Open
deniskuzZ wants to merge 1 commit intoapache:masterfrom
deniskuzZ:cleaner-test
Open

HIVE-29420: Extra test coverage for Cleaner with minOpenWriteId flag#6286
deniskuzZ wants to merge 1 commit intoapache:masterfrom
deniskuzZ:cleaner-test

Conversation

@deniskuzZ
Copy link
Member

@deniskuzZ deniskuzZ commented Feb 1, 2026

What changes were proposed in this pull request?

Added new test

Why are the changes needed?

Improve test coverage

Does this PR introduce any user-facing change?

No

How was this patch tested?

Test

@deniskuzZ deniskuzZ force-pushed the cleaner-test branch 2 times, most recently from 57409f2 to 34ec13f Compare February 2, 2026 17:26
@deniskuzZ deniskuzZ force-pushed the cleaner-test branch 2 times, most recently from ed654b1 to 8968220 Compare February 2, 2026 17:54
@deniskuzZ deniskuzZ changed the title Extra test coverage for Cleaner with minOpenWriteId flag HIVE-29420: Extra test coverage for Cleaner with minOpenWriteId flag Feb 2, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

long compactorTxnId = compactInTxn(rqst, CommitAction.COMMIT);

// Wait for the cooldown period so the Cleaner can see the last committed txn as the highest committed watermark
// TODO: doesn't belong here, should probably be moved to CompactorTest#startCleaner()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I wonder, how many effort required to actually move that code to CompactorTest#startCleaner()?

Copy link
Member Author

@deniskuzZ deniskuzZ Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about that and then noticed that startCleaner is used in plenty of tests, so adding 1s sleep there probably not the best idea, however, that could have been a cleaner change

i can try moving there and see how long the build takes after the change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored by adding @SleepBeforeCleaner annotation

Copy link
Member Author

@deniskuzZ deniskuzZ Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped the refactor commit since many other tests failed as they didn't invoke startCleaner, but instead initiated cleaner directly.
it's a complete mess and out-of-scope of this PR

// Wait for the cooldown period so the Cleaner can see the last committed txn as the highest committed watermark
// TODO: doesn't belong here, should probably be moved to CompactorTest#startCleaner()
Thread.sleep(MetastoreConf.getTimeVar(
conf, ConfVars.TXN_OPENTXN_TIMEOUT, TimeUnit.MILLISECONDS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long compactInTxn(CompactionRequest rqst, CommitAction commitAction) already contains this sleep. What is the reason of the extra wait?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. I just saw the next change.


CompactionRequest rqst = new CompactionRequest("default", "camtc", CompactionType.MAJOR);
long compactTxn = compactInTxn(rqst, CommitAction.MARK_COMPACTED);
addBaseFile(t, null, 25L, 25, 26);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why this addBaseFile moved here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i moved the compacted dir creation closer to the actual compaction txn that is supposed to create it

Comment on lines +740 to +742
Thread.sleep(MetastoreConf.getTimeVar(
conf, ConfVars.TXN_OPENTXN_TIMEOUT, TimeUnit.MILLISECONDS));
return compactorTxnId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a status stored somewhere (maybe in backend db) which indicates whether the cooldown period is timed out or not? If yes IMHO checking the status would be more consistent.

Copy link
Member Author

@deniskuzZ deniskuzZ Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically that sleep makes sure Cleaner sees txn in proper state and this hack currently used in multiple places UPSTREAM-ONLY :

  static void runCleaner(HiveConf hConf) throws Exception {
    // Wait for the cooldown period so the Cleaner can see last committed txn as the highest committed watermark
    Thread.sleep(MetastoreConf.getTimeVar(hConf, MetastoreConf.ConfVars.TXN_OPENTXN_TIMEOUT, TimeUnit.MILLISECONDS));

we could probably query TXNS table and wait for 'c' status. c - committed.

however, as i understand the txn won't be considered committed sooner that this TXN_OPENTXN_TIMEOUT config

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

however, as i understand the txn won't be considered committed sooner that this TXN_OPENTXN_TIMEOUT config

That makes sense. Is it possible that while the test is running on one thread and waiting for the timeout, the commit—running on another thread—gets blocked by a lack of resources? If the commit actually happens later than the timeout, would that cause any issues?

If we have the chance, perhaps in a follow-up PR, I’m leaning toward

we could probably query TXNS table and wait for 'c' status. c - committed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i think that would require significant refactor

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants