Skip to content

HBASE-29958 Improve log messages#7857

Open
sharmaar12 wants to merge 8 commits intoapache:HBASE-29081from
sharmaar12:log-messages
Open

HBASE-29958 Improve log messages#7857
sharmaar12 wants to merge 8 commits intoapache:HBASE-29081from
sharmaar12:log-messages

Conversation

@sharmaar12
Copy link

No description provided.

@sharmaar12 sharmaar12 marked this pull request as ready for review March 5, 2026 15:27
+ "(read-write) mode on this storage location. Active Cluster Id: {} " + acs
+ " This cluster Id: " + getClusterId());
+ "(read-write) mode on this storage location. Active Cluster Id: " + acs
+ ", This cluster Id: " + getClusterId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getClusterId() alone is not enough. We need to log the cluster id + suffix like this:

dcf9e176-8437-49a9-9e91-b221b94e8fe3:<blank>

or

dcf9e176-8437-49a9-9e91-b221b94e8fe3:replica1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code accordingly. Sorry missed that part.

Copy link
Author

@sharmaar12 sharmaar12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code please review.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it looks like with this patch:

2026-03-09T15:02:52,121 INFO  [main] hbase.TableName: Read Replica Cluster suffix value:
2026-03-09T15:02:53,260 INFO  [master/hbase-docker:16000:becomeActiveMaster] master.MasterFileSystem: Read Replica Cluster: This is the active cluster on this storage location with cluster id: 0115d976-1306-44c7-b8b6-f2644f8867c9:

I suggest to use "<blank>" value in the log messages if the suffix is empty, because log messages currently are very confusing to me.

Second, what does this log message mean?

2026-03-09T15:02:53,259 WARN  [master/hbase-docker:16000:becomeActiveMaster] util.FSUtils: Read Replica Cluster id file file:/data-store/hbase/active.cluster.suffix.id is empty

What's empty? Why is it empty?
I restarted the active cluster and the active cluster suffix file was present with the cluster id + blank suffix value: 0115d976-1306-44c7-b8b6-f2644f8867c9:<blank>

@sharmaar12
Copy link
Author

This is how it looks like with this patch:

2026-03-09T15:02:52,121 INFO  [main] hbase.TableName: Read Replica Cluster suffix value:
2026-03-09T15:02:53,260 INFO  [master/hbase-docker:16000:becomeActiveMaster] master.MasterFileSystem: Read Replica Cluster: This is the active cluster on this storage location with cluster id: 0115d976-1306-44c7-b8b6-f2644f8867c9:

I suggest to use "" value in the log messages if the suffix is empty, because log messages currently are very confusing to me.

Will do the changes accordingly.

Second, what does this log message mean?

2026-03-09T15:02:53,259 WARN  [master/hbase-docker:16000:becomeActiveMaster] util.FSUtils: Read Replica Cluster id file file:/data-store/hbase/active.cluster.suffix.id is empty

What's empty? Why is it empty? I restarted the active cluster and the active cluster suffix file was present with the cluster id + blank suffix value: 0115d976-1306-44c7-b8b6-f2644f8867c9:<blank>

@anmolnar I am not much familiar with the nitty-gritty of this file based implementation as some other engineer worked on this but IIRC this may means that active.cluster.id.file may have been created without any data in it. I am not sure in which scenario this will happen.

new String(getSuffixFileDataToWrite(), StandardCharsets.UTF_8));
} catch (FileNotFoundException fnfe) {
// this is the active cluster, create active cluster suffix file if it does not exist
FSUtils.setActiveClusterSuffix(fs, rootdir, getSuffixFileDataToWrite(), wait);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging of this branch is also inaccurate. This is the point where we first create the suffix file when file system is initialized in the active cluster and no logs are emitted. The FSUtil method setActiveClusterSuffix will log, but only at debug level. Please add some meaningful log message here too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@anmolnar
Copy link
Contributor

anmolnar commented Mar 9, 2026

@anmolnar I am not much familiar with the nitty-gritty of this file based implementation as some other engineer worked on this but IIRC this may means that active.cluster.id.file may have been created without any data in it. I am not sure in which scenario this will happen.

I mentioned already:

I restarted the active cluster when the active cluster suffix file was present already with the cluster id + blank suffix value: 0115d976-1306-44c7-b8b6-f2644f8867c9:<blank>

@sharmaar12
Copy link
Author

@anmolnar Let me change accordingly.

String suffix_val = conf.get(HConstants.HBASE_META_TABLE_SUFFIX,
HConstants.HBASE_META_TABLE_SUFFIX_DEFAULT_VALUE);
LOG.info("Meta table suffix value: {}", suffix_val);
LOG.info("Read Replica Cluster suffix value: {}", suffix_val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called by every cluster, active/replica. Labeling it "Read Replica Cluster" in the log will cause confusion for standard (non-replica) deployments.

Copy link
Contributor

@anmolnar anmolnar Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to connect the message to the feature somehow. What do you recommend?
We're not implementing a big feature flag for the feature, so once this is merged every HBase cluster will be either an active or a read-only cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could move this log into the else block (line 98)? That way it only fires when the suffix exists. If not, adding a prefix like "[Read-replica feature]" to the log line would also work well.

Since we’re initializing the meta table name here, I think having "meta table name suffix" in the log would be a bit more informative, wdyt?

Copy link
Contributor

@anmolnar anmolnar Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to remove this log line completely. Meta table name is already logged as:

2026-03-09T16:12:48,322 INFO  [main] hbase.TableName: Meta table name: hbase:meta

Which should be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or move it to debug level with the "[Read-replica feature]" prefix that you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be redundant at info level, we can move it to debug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to debug log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detail about which file and which config in this log line would be more helpful while debugging.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kota-SH Which config you are referring here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line#402, File Suffix {} : Configured suffix {}, adding some details about which file and which configured suffix would be helpful

} else {
// throw error
LOG.info("rootdir {} : Active Cluster File Suffix {} ", rootdir, acs);
LOG.info("rootdir {} : Read replica active cluster file suffix {} ", rootdir, acs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mismatch scenario, logging something like "Active cluster suffix mismatch -- rootdir: {} but the file contains: {}, another cluster is running in active" would be more helpful imo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to:
LOG.info("[Read-replica feature] Another cluster is running in active (read-write) mode on this storage location. Active cluster ID: {}, This cluster ID {}. Rootdir location {} ", acs, getSuffixFromConfig(), rootdir);
Hope this helps.

sharmaar12 and others added 3 commits March 9, 2026 23:57
…s.java

Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com>
…s.java

Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com>
…erFileSystem.java

Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com>
Copy link
Author

@sharmaar12 sharmaar12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated few log messages.

  • Remaining things are appending when suffix is empty
  • Why log message "LOG.warn("[Read-replica feature] id file {} is empty ", idPath);" is present?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants