Skip to content

HBASE-29891: Multi-table continuous incremental backup is failing bec…#7891

Open
kgeisz wants to merge 3 commits intoapache:HBASE-28957_rebasedfrom
kgeisz:HBASE-29891-multi-table-incr-backup-failure
Open

HBASE-29891: Multi-table continuous incremental backup is failing bec…#7891
kgeisz wants to merge 3 commits intoapache:HBASE-28957_rebasedfrom
kgeisz:HBASE-29891-multi-table-incr-backup-failure

Conversation

@kgeisz
Copy link
Contributor

@kgeisz kgeisz commented Mar 9, 2026

https://issues.apache.org/jira/browse/HBASE-29891

Key Changes

  • For continuous incremental backups, the bulk load output directory for WALs-to-HFiles conversions is now a separate directory for each table.
    • Before: backupRoot/.tmp/backup_X -> After: backupRoot/.tmp/backup_X/namespace/table
  • walToHFiles() in IncrementalTableBackupClient.java now sets hbase.mapreduce.use.multi.table.hfileoutputformat to false when configuring WALPlayer
  • This same hbase.mapreduce.use.multi.table.hfileoutputformat config is also set to false when replaying WALs for continuous backups.
  • Added logic to WALPlayer so it does not always use a multi-table HFile output format (regardless of the value of hbase.mapreduce.use.multi.table.hfileoutputformat)
  • Added a unit test for multi-table incremental backup and restore. The test also verifies the integrity of the data after the restore.

Background

This pull request fixes an issue where running an incremental backup on multiple tables at once results in a failure. When continuous backup is enabled, an incremental backup firsts convert the WALs to HFiles. These HFiles are output to a .tmp/backup_X directory (where X is the backup ID). This is known as the "bulk load output directory". After, a distcp is performed to copy the temporary backup directory to the actual backup directory.

Here is an example file system after the WALs to HFiles conversion and before the distcp. The distcp is supposed to copy the contents of backupRoot/.tmp/backup_INCR02 into backupRoot/backup_INCR02:

backupRoot
├── .tmp
│   └── backup_INCR02
│       ├── default
│       │   ├── table1
│       │   │   └── cf
│       │   └── table2
│       │       └── cf
│       └── namespace1
│           ├── table3
│           │   └── cf
│           └── table4
│               └── cf
├── backup_FULL01
│   ├── .backup.manifest
│   ├── default
│   │   ├── table1
│   │   │   └── .hbase-snapshot
│   │   └── table2
│   │       └── .hbase-snapshot
│   └── namespace1
│       ├── table3
│       │   └── .hbase-snapshot
│       └── table4
│           └── .hbase-snapshot
└── backup_INCR02
    ├── default
    │   ├── table1
    │   │   ├── .tabledesc
    │   │   └── 8d01b
    │   └── table2
    │       ├── .tabledesc
    │       └── 5g03w
    └── namespace1
        ├── table3
        │   ├── .tabledesc
        │   └── 1d42g
        └── table4
            ├── .tabledesc
            └── g49j7

Incremental backups convert WALs to HFiles one table at a time, even if a backup set contains more than one table. When WALs are converted to HFiles, the WALPlayer runs and a map-reduce job is performed. The HFiles are sent to a newly created backupRoot/.tmp/backup_X directory. The MR job for the first table runs without any issues. The problem occurs during the second MR job. This backupRoot/.tmp/backup_X now already exists, which causes the MR job to fail with something like:

2026-02-11T13:54:17,945 ERROR [Time-limited test {}] impl.TableBackupClient(232): Unexpected exception in incremental-backup: incremental copy backup_1770846846624Output directory hdfs://localhost:64120/backupUT/.tmp/backup_1770846846624 already exists

Solution

Summary

This fix changes the bulk load output directory for continuous incremental backups. Since the WALPlayer is run individually for each table, each WALs-to-HFiles conversion can be sent to a directory for that specific table. An example bulk load output directory for table1 in the default namespace would be backupRoot/.tmp/backup_X/default/table1. Then, table2 would get its own bulk load output directory, etc.

Issues while working on the fix

Getting the proper bulk load output and getting the distcp to run successfully took more effort than expected. Changing the bulk load output directory for each table was simple. The real challenge was getting the HFiles to be output in the proper directory structure. Since backupRoot/.tmp/backup_X/namespace/table is already the output directory, we only want the HFiles' columnFamily directory to be placed inside table. We don't want the typical namespace/table/columnFamily output structure.

  • If we set the output directory for table1 to be backupRoot/.tmp/backup_X/default/table1, then the HFiles would instead be output to backupRoot/.tmp/backup_X/default/table1/default/table1, where the namespace and table name directories are repeated. This caused the .tmp directory structure to look like the following after running WALPlayer:
backupRootDir
├── .tmp
│   └── backup_02INCR
│       └── default
│           ├── table1
│           │   ├── _SUCCESS
│           │   └── default
│           │       └── table1
│           │           └── cf
│           └── table2
│               ├── _SUCCESS
│               └── default
│                   └── table2
│                       └── cf
├── backup_01FULL
│   ├── .backup.manifest
│   └── default
│       ├── table1
│       │   └── .hbase-snapshot
│       └── table2
│           └── .hbase-snapshot
└── backup_02INCR
    └── default
        ├── table1
        │   ├── .tabledesc
        │   └── 8d01b
        └── table2
            ├── .tabledesc
            └── 5g03w
  • Telling distcp to just copy the deeper default/table directories resulted in a failure from distcp due to conflicting source directory names. This works if there is only one table in each namespace, but does not work if a namespace has multiple directories. This is because the distcp looks as follows:
distcp backupRoot/.tmp/backup_X/default/table1/default backupRoot/.tmp/backup_X/default/table1/default <destination>

Resulting in an error like:

2026-03-03T09:20:01,847 ERROR [Time-limited test {}] mapreduce.MapReduceBackupCopyJob$BackupDistCp(235): org.apache.hadoop.tools.CopyListing$DuplicateFileException: File hdfs://localhost:60356/backupUT/.tmp/backup_1772558388312/default/table1/default and hdfs://localhost:60356/backupUT/.tmp/backup_1772558388312/default/table2/default would cause duplicates. Aborting
  • Copying just the deeper table name directories results in an improper directory structure in the destination. A single distcp command can have multiple source directories, but only one destination directory:
distcp backupRoot/.tmp/backup_X/default/table1/default/table1 backupRoot/.tmp/backup_X/default/table1/default/table1 backup_INCR02

backup_INCR02
├── default
│   ├── table1
│   │   ├── .tabledesc
│   │   └── 8d01b
│   └── table2
│       ├── .tabledesc
│       └── 5g03w
├── table1
└── table2
  • Using -update in the distcp command did not get the desired result either.

  • Using IncrementalTableBackupClient.getBulkOutputDirForTable() to create the bulk load directory caused similar issues. The only difference is the "doubling up" of the directories had a data directory in between, like: backupRoot/.tmp/backup_X/default/table/data/default/table

Potential Workaround

A workaround for the issues mentioned above would be to run the distcp for each namespace. Then, the source directories would be unique table names, and they could all have the same destination directory (the namespace dir). However, this means a different distcp would need to be performed for each namespace in the backup set.

The Actual Solution

We want the WALs-to-HFiles to output to something like this in .tmp:

backupRoot
└── .tmp
    └── backup_INCR02
        ├── default
        │   ├── table1
        │   │   └── cf
        │   └── table2
        │       └── cf
        └── namespace1
            ├── table3
            │   └── cf
            └── table4
                └── cf

In order to get rid of the "double namespace/tableName" directory structure, we have to change how the HFiles are output. We want to keep our bulk load output directory as backupRoot/.tmp/backup_X/namespace/table and have just the cf column family directory sent there, not namespace/table/cf.

This is done by setting the hbase.mapreduce.use.multi.table.hfileoutputformat config key to false for continuous incremental backups. The problem here is WALPlayer.java was always using MultiTableHFileOutputFormat, which implicitly sets hbase.mapreduce.use.multi.table.hfileoutputformat to true. That's why some changes were made to the logic in WALPlayer.java.

Also, this hfileoutputformat config key needs to be false when replaying the WALs during a restore. Otherwise, a failure occurs like the following:

2026-03-05T18:32:55,042 WARN  [Thread-1018 {}] mapred.LocalJobRunner$Job(590): job_local1580221296_0005
java.lang.Exception: java.lang.IllegalArgumentException: Invalid format for composite key [rowLoad0]. Cannot extract tablename and suffix from key
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) ~[hadoop-mapreduce-client-common-3.4.2.jar:?]
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559) ~[hadoop-mapreduce-client-common-3.4.2.jar:?]
Caused by: java.lang.IllegalArgumentException: Invalid format for composite key [rowLoad0]. Cannot extract tablename and suffix from key

…ause output directory already exists

Change-Id: I710cc8d0d87a299b7782a19d93f28bf6283c2436
@kgeisz kgeisz force-pushed the HBASE-29891-multi-table-incr-backup-failure branch from c4a70b9 to 8ed360e Compare March 9, 2026 23:29
@kgeisz
Copy link
Contributor Author

kgeisz commented Mar 10, 2026

@vinayakphegde Here is the fix for HBASE-29891

@kgeisz
Copy link
Contributor Author

kgeisz commented Mar 10, 2026

I successfully created a multi-table continuous incremental backup in the hbase-docker container setup. I was able to take the backup when each table had 1,000 rows. After the incremental backup, I added 1,000 more rows to each table, and then I did a point-in-time restore and verified the target tables had just 1,000 rows instead of 2,000.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch looks good to me. Just a nitpick.
Have you considered adding unit tests for the WALPlayer changes?

kgeisz added 2 commits March 11, 2026 14:58
Change-Id: Ia3eebdfc8c2061a512bc5a448da9f79e09d57759
Change-Id: I58691d0d0de91b102ee6774a213dadf5b6207929
@kgeisz kgeisz force-pushed the HBASE-29891-multi-table-incr-backup-failure branch from 19c1531 to de3b79f Compare March 12, 2026 17:47
@kgeisz
Copy link
Contributor Author

kgeisz commented Mar 12, 2026

@anmolnar, I have added some unit tests that cover the changes I made to the WALPlayer. I also added a default value for MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY.

@kgeisz kgeisz requested a review from anmolnar March 12, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants