HIVE-29420: Extra test coverage for Cleaner with minOpenWriteId flag#6286
HIVE-29420: Extra test coverage for Cleaner with minOpenWriteId flag#6286deniskuzZ wants to merge 1 commit intoapache:masterfrom
Conversation
f2adf9c to
1f6fb58
Compare
1f6fb58 to
a65cc64
Compare
e0c4db8 to
104ed32
Compare
57409f2 to
34ec13f
Compare
ed654b1 to
8968220
Compare
8968220 to
fb1d440
Compare
fb1d440 to
a4e231b
Compare
| long compactorTxnId = compactInTxn(rqst, CommitAction.COMMIT); | ||
|
|
||
| // Wait for the cooldown period so the Cleaner can see the last committed txn as the highest committed watermark | ||
| // TODO: doesn't belong here, should probably be moved to CompactorTest#startCleaner() |
There was a problem hiding this comment.
Hi, I wonder, how many effort required to actually move that code to CompactorTest#startCleaner()?
There was a problem hiding this comment.
i thought about that and then noticed that startCleaner is used in plenty of tests, so adding 1s sleep there probably not the best idea, however, that could have been a cleaner change
i can try moving there and see how long the build takes after the change
There was a problem hiding this comment.
refactored by adding @SleepBeforeCleaner annotation
There was a problem hiding this comment.
dropped the refactor commit since many other tests failed as they didn't invoke startCleaner, but instead initiated cleaner directly.
it's a complete mess and out-of-scope of this PR
| // Wait for the cooldown period so the Cleaner can see the last committed txn as the highest committed watermark | ||
| // TODO: doesn't belong here, should probably be moved to CompactorTest#startCleaner() | ||
| Thread.sleep(MetastoreConf.getTimeVar( | ||
| conf, ConfVars.TXN_OPENTXN_TIMEOUT, TimeUnit.MILLISECONDS)); |
There was a problem hiding this comment.
long compactInTxn(CompactionRequest rqst, CommitAction commitAction) already contains this sleep. What is the reason of the extra wait?
There was a problem hiding this comment.
Never mind. I just saw the next change.
|
|
||
| CompactionRequest rqst = new CompactionRequest("default", "camtc", CompactionType.MAJOR); | ||
| long compactTxn = compactInTxn(rqst, CommitAction.MARK_COMPACTED); | ||
| addBaseFile(t, null, 25L, 25, 26); |
There was a problem hiding this comment.
Could you please explain why this addBaseFile moved here?
There was a problem hiding this comment.
i moved the compacted dir creation closer to the actual compaction txn that is supposed to create it
| Thread.sleep(MetastoreConf.getTimeVar( | ||
| conf, ConfVars.TXN_OPENTXN_TIMEOUT, TimeUnit.MILLISECONDS)); | ||
| return compactorTxnId; |
There was a problem hiding this comment.
Is there a status stored somewhere (maybe in backend db) which indicates whether the cooldown period is timed out or not? If yes IMHO checking the status would be more consistent.
There was a problem hiding this comment.
basically that sleep makes sure Cleaner sees txn in proper state and this hack currently used in multiple places UPSTREAM-ONLY :
static void runCleaner(HiveConf hConf) throws Exception {
// Wait for the cooldown period so the Cleaner can see last committed txn as the highest committed watermark
Thread.sleep(MetastoreConf.getTimeVar(hConf, MetastoreConf.ConfVars.TXN_OPENTXN_TIMEOUT, TimeUnit.MILLISECONDS));
we could probably query TXNS table and wait for 'c' status. c - committed.
however, as i understand the txn won't be considered committed sooner that this TXN_OPENTXN_TIMEOUT config
There was a problem hiding this comment.
Thanks for the explanation.
however, as i understand the txn won't be considered committed sooner that this TXN_OPENTXN_TIMEOUT config
That makes sense. Is it possible that while the test is running on one thread and waiting for the timeout, the commit—running on another thread—gets blocked by a lack of resources? If the commit actually happens later than the timeout, would that cause any issues?
If we have the chance, perhaps in a follow-up PR, I’m leaning toward
we could probably query TXNS table and wait for 'c' status. c - committed.
There was a problem hiding this comment.
yea, i think that would require significant refactor
b958153 to
a4e231b
Compare
|



What changes were proposed in this pull request?
Added new test
Why are the changes needed?
Improve test coverage
Does this PR introduce any user-facing change?
No
How was this patch tested?
Test