HBASE-29958 Improve log messages#7857
Conversation
| + "(read-write) mode on this storage location. Active Cluster Id: {} " + acs | ||
| + " This cluster Id: " + getClusterId()); | ||
| + "(read-write) mode on this storage location. Active Cluster Id: " + acs | ||
| + ", This cluster Id: " + getClusterId()); |
There was a problem hiding this comment.
getClusterId() alone is not enough. We need to log the cluster id + suffix like this:
dcf9e176-8437-49a9-9e91-b221b94e8fe3:<blank>
or
dcf9e176-8437-49a9-9e91-b221b94e8fe3:replica1
There was a problem hiding this comment.
Updated the code accordingly. Sorry missed that part.
sharmaar12
left a comment
There was a problem hiding this comment.
Updated the code please review.
There was a problem hiding this comment.
This is how it looks like with this patch:
2026-03-09T15:02:52,121 INFO [main] hbase.TableName: Read Replica Cluster suffix value:
2026-03-09T15:02:53,260 INFO [master/hbase-docker:16000:becomeActiveMaster] master.MasterFileSystem: Read Replica Cluster: This is the active cluster on this storage location with cluster id: 0115d976-1306-44c7-b8b6-f2644f8867c9:
I suggest to use "<blank>" value in the log messages if the suffix is empty, because log messages currently are very confusing to me.
Second, what does this log message mean?
2026-03-09T15:02:53,259 WARN [master/hbase-docker:16000:becomeActiveMaster] util.FSUtils: Read Replica Cluster id file file:/data-store/hbase/active.cluster.suffix.id is empty
What's empty? Why is it empty?
I restarted the active cluster and the active cluster suffix file was present with the cluster id + blank suffix value: 0115d976-1306-44c7-b8b6-f2644f8867c9:<blank>
Will do the changes accordingly.
@anmolnar I am not much familiar with the nitty-gritty of this file based implementation as some other engineer worked on this but IIRC this may means that active.cluster.id.file may have been created without any data in it. I am not sure in which scenario this will happen. |
| new String(getSuffixFileDataToWrite(), StandardCharsets.UTF_8)); | ||
| } catch (FileNotFoundException fnfe) { | ||
| // this is the active cluster, create active cluster suffix file if it does not exist | ||
| FSUtils.setActiveClusterSuffix(fs, rootdir, getSuffixFileDataToWrite(), wait); |
There was a problem hiding this comment.
Logging of this branch is also inaccurate. This is the point where we first create the suffix file when file system is initialized in the active cluster and no logs are emitted. The FSUtil method setActiveClusterSuffix will log, but only at debug level. Please add some meaningful log message here too.
I mentioned already:
|
|
@anmolnar Let me change accordingly. |
| String suffix_val = conf.get(HConstants.HBASE_META_TABLE_SUFFIX, | ||
| HConstants.HBASE_META_TABLE_SUFFIX_DEFAULT_VALUE); | ||
| LOG.info("Meta table suffix value: {}", suffix_val); | ||
| LOG.info("Read Replica Cluster suffix value: {}", suffix_val); |
There was a problem hiding this comment.
This method is called by every cluster, active/replica. Labeling it "Read Replica Cluster" in the log will cause confusion for standard (non-replica) deployments.
There was a problem hiding this comment.
I'd like to connect the message to the feature somehow. What do you recommend?
We're not implementing a big feature flag for the feature, so once this is merged every HBase cluster will be either an active or a read-only cluster.
There was a problem hiding this comment.
Perhaps we could move this log into the else block (line 98)? That way it only fires when the suffix exists. If not, adding a prefix like "[Read-replica feature]" to the log line would also work well.
Since we’re initializing the meta table name here, I think having "meta table name suffix" in the log would be a bit more informative, wdyt?
There was a problem hiding this comment.
We might want to remove this log line completely. Meta table name is already logged as:
2026-03-09T16:12:48,322 INFO [main] hbase.TableName: Meta table name: hbase:meta
Which should be enough.
There was a problem hiding this comment.
...or move it to debug level with the "[Read-replica feature]" prefix that you suggested.
There was a problem hiding this comment.
Yeah, it might be redundant at info level, we can move it to debug.
There was a problem hiding this comment.
Detail about which file and which config in this log line would be more helpful while debugging.
There was a problem hiding this comment.
In line#402, File Suffix {} : Configured suffix {}, adding some details about which file and which configured suffix would be helpful
| } else { | ||
| // throw error | ||
| LOG.info("rootdir {} : Active Cluster File Suffix {} ", rootdir, acs); | ||
| LOG.info("rootdir {} : Read replica active cluster file suffix {} ", rootdir, acs); |
There was a problem hiding this comment.
This is a mismatch scenario, logging something like "Active cluster suffix mismatch -- rootdir: {} but the file contains: {}, another cluster is running in active" would be more helpful imo.
There was a problem hiding this comment.
Change this to:
LOG.info("[Read-replica feature] Another cluster is running in active (read-write) mode on this storage location. Active cluster ID: {}, This cluster ID {}. Rootdir location {} ", acs, getSuffixFromConfig(), rootdir);
Hope this helps.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
Outdated
Show resolved
Hide resolved
…s.java Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com>
…s.java Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com>
…erFileSystem.java Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com>
sharmaar12
left a comment
There was a problem hiding this comment.
Updated few log messages.
- Remaining things are appending when suffix is empty
- Why log message "
LOG.warn("[Read-replica feature] id file {} is empty ", idPath);" is present?
No description provided.