HBASE-29891: Multi-table continuous incremental backup is failing bec… by kgeisz · Pull Request #7891 · apache/hbase

kgeisz · 2026-03-09T23:21:32Z

https://issues.apache.org/jira/browse/HBASE-29891

Key Changes

For continuous incremental backups, the bulk load output directory for WALs-to-HFiles conversions is now a separate directory for each table.
- Before: backupRoot/.tmp/backup_X -> After: backupRoot/.tmp/backup_X/namespace/table
walToHFiles() in IncrementalTableBackupClient.java now sets hbase.mapreduce.use.multi.table.hfileoutputformat to false when configuring WALPlayer
This same hbase.mapreduce.use.multi.table.hfileoutputformat config is also set to false when replaying WALs for continuous backups.
Added logic to WALPlayer so it does not always use a multi-table HFile output format (regardless of the value of hbase.mapreduce.use.multi.table.hfileoutputformat)
Added a unit test for multi-table incremental backup and restore. The test also verifies the integrity of the data after the restore.

Background

This pull request fixes an issue where running an incremental backup on multiple tables at once results in a failure. When continuous backup is enabled, an incremental backup firsts convert the WALs to HFiles. These HFiles are output to a .tmp/backup_X directory (where X is the backup ID). This is known as the "bulk load output directory". After, a distcp is performed to copy the temporary backup directory to the actual backup directory.

Here is an example file system after the WALs to HFiles conversion and before the distcp. The distcp is supposed to copy the contents of backupRoot/.tmp/backup_INCR02 into backupRoot/backup_INCR02:

backupRoot
├── .tmp
│   └── backup_INCR02
│       ├── default
│       │   ├── table1
│       │   │   └── cf
│       │   └── table2
│       │       └── cf
│       └── namespace1
│           ├── table3
│           │   └── cf
│           └── table4
│               └── cf
├── backup_FULL01
│   ├── .backup.manifest
│   ├── default
│   │   ├── table1
│   │   │   └── .hbase-snapshot
│   │   └── table2
│   │       └── .hbase-snapshot
│   └── namespace1
│       ├── table3
│       │   └── .hbase-snapshot
│       └── table4
│           └── .hbase-snapshot
└── backup_INCR02
    ├── default
    │   ├── table1
    │   │   ├── .tabledesc
    │   │   └── 8d01b
    │   └── table2
    │       ├── .tabledesc
    │       └── 5g03w
    └── namespace1
        ├── table3
        │   ├── .tabledesc
        │   └── 1d42g
        └── table4
            ├── .tabledesc
            └── g49j7

Incremental backups convert WALs to HFiles one table at a time, even if a backup set contains more than one table. When WALs are converted to HFiles, the WALPlayer runs and a map-reduce job is performed. The HFiles are sent to a newly created backupRoot/.tmp/backup_X directory. The MR job for the first table runs without any issues. The problem occurs during the second MR job. This backupRoot/.tmp/backup_X now already exists, which causes the MR job to fail with something like:

2026-02-11T13:54:17,945 ERROR [Time-limited test {}] impl.TableBackupClient(232): Unexpected exception in incremental-backup: incremental copy backup_1770846846624Output directory hdfs://localhost:64120/backupUT/.tmp/backup_1770846846624 already exists

Solution

Summary

This fix changes the bulk load output directory for continuous incremental backups. Since the WALPlayer is run individually for each table, each WALs-to-HFiles conversion can be sent to a directory for that specific table. An example bulk load output directory for table1 in the default namespace would be backupRoot/.tmp/backup_X/default/table1. Then, table2 would get its own bulk load output directory, etc.

Issues while working on the fix

Getting the proper bulk load output and getting the distcp to run successfully took more effort than expected. Changing the bulk load output directory for each table was simple. The real challenge was getting the HFiles to be output in the proper directory structure. Since backupRoot/.tmp/backup_X/namespace/table is already the output directory, we only want the HFiles' columnFamily directory to be placed inside table. We don't want the typical namespace/table/columnFamily output structure.

If we set the output directory for table1 to be backupRoot/.tmp/backup_X/default/table1, then the HFiles would instead be output to backupRoot/.tmp/backup_X/default/table1/default/table1, where the namespace and table name directories are repeated. This caused the .tmp directory structure to look like the following after running WALPlayer:

backupRootDir
├── .tmp
│   └── backup_02INCR
│       └── default
│           ├── table1
│           │   ├── _SUCCESS
│           │   └── default
│           │       └── table1
│           │           └── cf
│           └── table2
│               ├── _SUCCESS
│               └── default
│                   └── table2
│                       └── cf
├── backup_01FULL
│   ├── .backup.manifest
│   └── default
│       ├── table1
│       │   └── .hbase-snapshot
│       └── table2
│           └── .hbase-snapshot
└── backup_02INCR
    └── default
        ├── table1
        │   ├── .tabledesc
        │   └── 8d01b
        └── table2
            ├── .tabledesc
            └── 5g03w

Telling distcp to just copy the deeper default/table directories resulted in a failure from distcp due to conflicting source directory names. This works if there is only one table in each namespace, but does not work if a namespace has multiple directories. This is because the distcp looks as follows:

distcp backupRoot/.tmp/backup_X/default/table1/default backupRoot/.tmp/backup_X/default/table1/default <destination>

Resulting in an error like:

2026-03-03T09:20:01,847 ERROR [Time-limited test {}] mapreduce.MapReduceBackupCopyJob$BackupDistCp(235): org.apache.hadoop.tools.CopyListing$DuplicateFileException: File hdfs://localhost:60356/backupUT/.tmp/backup_1772558388312/default/table1/default and hdfs://localhost:60356/backupUT/.tmp/backup_1772558388312/default/table2/default would cause duplicates. Aborting

Copying just the deeper table name directories results in an improper directory structure in the destination. A single distcp command can have multiple source directories, but only one destination directory:

distcp backupRoot/.tmp/backup_X/default/table1/default/table1 backupRoot/.tmp/backup_X/default/table1/default/table1 backup_INCR02

backup_INCR02
├── default
│   ├── table1
│   │   ├── .tabledesc
│   │   └── 8d01b
│   └── table2
│       ├── .tabledesc
│       └── 5g03w
├── table1
└── table2

Using -update in the distcp command did not get the desired result either.
Using IncrementalTableBackupClient.getBulkOutputDirForTable() to create the bulk load directory caused similar issues. The only difference is the "doubling up" of the directories had a data directory in between, like: backupRoot/.tmp/backup_X/default/table/data/default/table

Potential Workaround

A workaround for the issues mentioned above would be to run the distcp for each namespace. Then, the source directories would be unique table names, and they could all have the same destination directory (the namespace dir). However, this means a different distcp would need to be performed for each namespace in the backup set.

The Actual Solution

We want the WALs-to-HFiles to output to something like this in .tmp:

backupRoot
└── .tmp
    └── backup_INCR02
        ├── default
        │   ├── table1
        │   │   └── cf
        │   └── table2
        │       └── cf
        └── namespace1
            ├── table3
            │   └── cf
            └── table4
                └── cf

In order to get rid of the "double namespace/tableName" directory structure, we have to change how the HFiles are output. We want to keep our bulk load output directory as backupRoot/.tmp/backup_X/namespace/table and have just the cf column family directory sent there, not namespace/table/cf.

This is done by setting the hbase.mapreduce.use.multi.table.hfileoutputformat config key to false for continuous incremental backups. The problem here is WALPlayer.java was always using MultiTableHFileOutputFormat, which implicitly sets hbase.mapreduce.use.multi.table.hfileoutputformat to true. That's why some changes were made to the logic in WALPlayer.java.

Also, this hfileoutputformat config key needs to be false when replaying the WALs during a restore. Otherwise, a failure occurs like the following:

2026-03-05T18:32:55,042 WARN  [Thread-1018 {}] mapred.LocalJobRunner$Job(590): job_local1580221296_0005
java.lang.Exception: java.lang.IllegalArgumentException: Invalid format for composite key [rowLoad0]. Cannot extract tablename and suffix from key
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) ~[hadoop-mapreduce-client-common-3.4.2.jar:?]
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559) ~[hadoop-mapreduce-client-common-3.4.2.jar:?]
Caused by: java.lang.IllegalArgumentException: Invalid format for composite key [rowLoad0]. Cannot extract tablename and suffix from key

…ause output directory already exists Change-Id: I710cc8d0d87a299b7782a19d93f28bf6283c2436

kgeisz · 2026-03-10T00:01:15Z

@vinayakphegde Here is the fix for HBASE-29891

kgeisz · 2026-03-10T16:20:54Z

I successfully created a multi-table continuous incremental backup in the hbase-docker container setup. I was able to take the backup when each table had 1,000 rows. After the incremental backup, I added 1,000 more rows to each table, and then I did a point-in-time restore and verified the target tables had just 1,000 rows instead of 2,000.

anmolnar

Patch looks good to me. Just a nitpick.
Have you considered adding unit tests for the WALPlayer changes?

hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java

Change-Id: Ia3eebdfc8c2061a512bc5a448da9f79e09d57759

Change-Id: I58691d0d0de91b102ee6774a213dadf5b6207929

kgeisz · 2026-03-12T17:48:42Z

@anmolnar, I have added some unit tests that cover the changes I made to the WALPlayer. I also added a default value for MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY.

HBASE-29891: Multi-table continuous incremental backup is failing bec…

8ed360e

…ause output directory already exists Change-Id: I710cc8d0d87a299b7782a19d93f28bf6283c2436

kgeisz force-pushed the HBASE-29891-multi-table-incr-backup-failure branch from c4a70b9 to 8ed360e Compare March 9, 2026 23:29

anmolnar reviewed Mar 11, 2026

View reviewed changes

hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java Outdated Show resolved Hide resolved

kgeisz added 2 commits March 11, 2026 14:58

Add MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_DEFAULT

9d742d3

Change-Id: Ia3eebdfc8c2061a512bc5a448da9f79e09d57759

Add unit test coverage for WALPlayer changes

de3b79f

Change-Id: I58691d0d0de91b102ee6774a213dadf5b6207929

kgeisz force-pushed the HBASE-29891-multi-table-incr-backup-failure branch from 19c1531 to de3b79f Compare March 12, 2026 17:47

kgeisz requested a review from anmolnar March 12, 2026 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-29891: Multi-table continuous incremental backup is failing bec…#7891

HBASE-29891: Multi-table continuous incremental backup is failing bec…#7891
kgeisz wants to merge 3 commits intoapache:HBASE-28957_rebasedfrom
kgeisz:HBASE-29891-multi-table-incr-backup-failure

kgeisz commented Mar 9, 2026 •

edited

Loading

Uh oh!

kgeisz commented Mar 10, 2026

Uh oh!

kgeisz commented Mar 10, 2026

Uh oh!

anmolnar left a comment

Uh oh!

Uh oh!

kgeisz commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kgeisz commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Background

Solution

Summary

Issues while working on the fix

Potential Workaround

The Actual Solution

Uh oh!

kgeisz commented Mar 10, 2026

Uh oh!

kgeisz commented Mar 10, 2026

Uh oh!

anmolnar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kgeisz commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kgeisz commented Mar 9, 2026 •

edited

Loading