POC to avoid usage of ReplicationResult by ankitsol · Pull Request #7528 · apache/hbase

ankitsol · 2025-12-07T14:48:56Z

No description provided.

Apache-HBase · 2025-12-07T15:35:10Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 35s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ HBASE-28957 Compile Tests _
+0 🆗	mvndep	0m 15s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 30s		HBASE-28957 passed
+1 💚	compile	4m 3s		HBASE-28957 passed
-0 ⚠️	checkstyle	0m 11s	/buildtool-branch-checkstyle-hbase-backup.txt	The patch fails to run checkstyle in hbase-backup
+1 💚	spotbugs	2m 10s		HBASE-28957 passed
+1 💚	spotless	0m 51s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 11s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 10s		the patch passed
+1 💚	compile	4m 2s		the patch passed
+1 💚	javac	4m 2s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 9s	/buildtool-patch-checkstyle-hbase-backup.txt	The patch fails to run checkstyle in hbase-backup
+1 💚	spotbugs	2m 19s		the patch passed
+1 💚	hadoopcheck	12m 30s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 45s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 18s		The patch does not generate ASF License warnings.
		44m 13s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#7528
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 27679a0a2f2e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-28957 / `d2e1a17`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	85 (vs. ulimit of 30000)
modules	C: hbase-server hbase-backup U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-12-07T19:32:30Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 34s		Docker mode activated.
-0 ⚠️	yetus	0m 3s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ HBASE-28957 Compile Tests _
+0 🆗	mvndep	0m 15s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 32s		HBASE-28957 passed
+1 💚	compile	1m 20s		HBASE-28957 passed
+1 💚	javadoc	0m 44s		HBASE-28957 passed
+1 💚	shadedjars	6m 19s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 13s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 7s		the patch passed
+1 💚	compile	1m 19s		the patch passed
+1 💚	javac	1m 19s		the patch passed
+1 💚	javadoc	0m 41s		the patch passed
+1 💚	shadedjars	6m 11s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
-1 ❌	unit	233m 6s	/patch-unit-hbase-server.txt	hbase-server in the patch failed.
+1 💚	unit	18m 51s		hbase-backup in the patch passed.
		281m 31s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#7528
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux ad59f18e01da 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-28957 / `d2e1a17`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/testReport/
Max. process+thread count	4281 (vs. ulimit of 30000)
modules	C: hbase-server hbase-backup U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

anmolnar

@ankitsol You might want to raise PR against the HBASE-28957_rebased branch, but otherwise it looks good to me.

Apache9 · 2025-12-11T12:48:48Z

...r/src/main/java/org/apache/hadoop/hbase/replication/VerifyWALEntriesReplicationEndpoint.java

    replicateContext.entries.stream().map(WAL.Entry::getEdit).flatMap(e -> e.getCells().stream())
      .forEach(this::checkCell);
-    return ReplicationResult.COMMITTED;
+    getReplicationSource().cleanupHFileRefsAndPersistOffsets(replicateContext.getEntries());


Why we need to call this here?

Apache9 · 2025-12-11T12:55:58Z

...a/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java

        // replicate the batches to sink side.
        parallelReplicate(replicateContext, batches);
-        return ReplicationResult.COMMITTED;
+        getReplicationSource().cleanupHFileRefsAndPersistOffsets(replicateContext.getEntries());


OK I think I get why you call this method here, since here we can make sure that the wal entries have been persistent, it is OK for us to persist the offset. But for me, I prefer we follow the old way where call this in ReplicationSourceShipper.

But for me, I prefer we follow the old way where call this in ReplicationSourceShipper.

I think it's doable. @ankitsol ?

Apache9 · 2025-12-11T13:08:21Z

hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java

@@ -216,7 +216,7 @@ public int getTimeout() {
   * the context are assumed to be persisted in the target cluster.


What I mean is that, we should add a method may be called beforePersistingReplicationOffset, and call it before we call updateLogPosition in ReplicationSourceShipper method. For old implementation, we just do nothing as we can make sure that every thing is persistent, and for S3 based endpoint, we close the writer to persist the data on S3.

You cannot close the writer every time when something was shipped, because closing and re-opening the same stream is a costly operation if even supported. We have to wait for enough data to be shipped (file size limit) or the configured time spent (time limit) before closing the current stream and opening a new one. This is controlled by the replication endpoint itself.

So you also need to change the logic in ReplicationSourceShipper, to not always record the offset after shipping. And I do not think this can 'ONLY' be controlled by replication endpoint, in ReplicationSourceShipper you know the size of the WALEntries, and you also know how much time has elapsed after the last recording, so it is easy to implement the logic?

That's an interesting idea. @vinayakphegde @ankitsol wdyt?

@Apache9 Let's say the ReplicationSourceShipper controls when to record the offset. How would it know which kind of replication endpoint is it working with? Need to record the offset after every shipment or use time/size limit? Shall we make it a new attribute of the endpoints?

I prefer we control it by time/size limit.

Even if the endpoint can persist the data after every shipment, we do not need to record the offset every time right? We just need to make sure that once the ReplicationSourceShipper want to record the offset, all the data before this offset has been persistent. So we can introduce a beforePersistingReplicationOffset method for replication endpoint, if you persist the data after every shipment, you just need to do nothing. If it is S3 based endpoint, we close the output file to persist the data.

In this way, the ReplicationSourceShipper does not need to know whether the endpoint can persist the data or not after every shipment. And in the future, for HBaseInterClusterReplicationEndpoint, we could also introduce some asynchronous mechanism to increase performance.

@Apache9 This seems like a good approach. We also want it to be both time and size based.

I have two questions regarding time based approach

Should this time based count run on a separate thread. Currently in ContinuousBackupReplicationEndpoint we implemented it as a separate thread

Where should be save time/size based context? ReplicationSourceShipper or ReplicationSource, considering 'ReplicationSourceShipper' is itself a thread

CC @anmolnar @vinayakphegde

I think change the default behavior to be size/time based is OK, and we can make size = 0 as no size limit, size = -1 means persisting after every flush, in this way we can make the default size limit as -1 to keep the old behavior.

Sounds good to me.

@Apache9 This seems like a good approach. We also want it to be both time and size based.

I have two questions regarding time based approach

Should this time based count run on a separate thread. Currently in ContinuousBackupReplicationEndpoint we implemented it as a separate thread

Where should be save time/size based context? ReplicationSourceShipper or ReplicationSource, considering 'ReplicationSourceShipper' is itself a thread

CC @anmolnar @vinayakphegde

I think we can put the logic in ReplicationSourceShipper? There is a while loop in the thread, after every shipment, we calculate the size and time, and determine whether we should persist the offset.

As per our discussion, I have raised a new POC patch #7591

@Apache9 Please take a look at it and share your thoughts

Apache9 · 2025-12-11T13:09:41Z

...src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java

    }
  }

-  private ReplicationResult getReplicationResult() {


As said above, we should keep these methods here, and before calling these methods, we call the method in ReplicationEndpoint out to flush data out.

asolomonC added 2 commits December 7, 2025 20:02

POC to avoid usage of ReplicationResult

5e0f044

Spotless fix

d2e1a17

anmolnar approved these changes Dec 9, 2025

View reviewed changes

Apache9 reviewed Dec 11, 2025

View reviewed changes

ankitsol mentioned this pull request Jan 5, 2026

POC for size and timeout based flush at SourceShipper #7591

Open

		@@ -216,7 +216,7 @@ public int getTimeout() {
		* the context are assumed to be persisted in the target cluster.

Conversation

ankitsol commented Dec 7, 2025

Uh oh!

Apache-HBase commented Dec 7, 2025

Uh oh!

Apache-HBase commented Dec 7, 2025

Uh oh!

anmolnar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

anmolnar left a comment •

edited

Loading