Skip to content
/ server Public

MDEV-37977 InnoDB deadlock report incorrectly reports rolled back transaction number#4679

Open
arcivanov wants to merge 1 commit intoMariaDB:10.11from
arcivanov:MDEV-37977
Open

MDEV-37977 InnoDB deadlock report incorrectly reports rolled back transaction number#4679
arcivanov wants to merge 1 commit intoMariaDB:10.11from
arcivanov:MDEV-37977

Conversation

@arcivanov
Copy link
Contributor

The "WE ROLL BACK TRANSACTION (N)" message in the deadlock report referred to the wrong transaction number. The victim selection loop and the display loop in Deadlock::report() traversed the cycle in the same order (cycle->wait_trx, ..., cycle) but used misaligned position numbering:

  • Victim selection initialized victim = cycle at position 1 before the loop, then started iterating from cycle->wait_trx at position 2.
  • Display loop started from cycle->wait_trx at label (1), with cycle displayed last at label (N).

This caused victim_pos to be off by one relative to the displayed transaction labels.

Fix: restructure the victim selection loop to start with l=0 and victim=nullptr, letting the loop handle all transactions uniformly. The first iteration unconditionally picks cycle->wait_trx as the initial victim at position 1, matching the display loop. The thd_deadlock_victim_preference() call is guarded with a victim != nullptr check to skip it on the first iteration (where no prior victim exists to compare against).

@arcivanov
Copy link
Contributor Author

@dr-m

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 23, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. This is a preliminary review. Please stand by for the final review.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer descriptive test case names. Could one of the existing tests innodb.deadlock_detect or innodb.deadlock_victim_race be extended with this?

Comment on lines +47 to +59
--connection con1
--reap
ROLLBACK;

--connection default
let SEARCH_FILE= $MYSQLTEST_VARDIR/log/mysqld.1.err;
let SEARCH_PATTERN= WE ROLL BACK TRANSACTION \(1\);
--source include/search_pattern_in_file.inc

--disconnect con1
SET GLOBAL innodb_print_all_deadlocks= @save_print_all_deadlocks;
DROP TABLE t1;
--source include/wait_until_count_sessions.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the --disconnect con1 right after the ROLLBACK, to reduce the wait time at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would you prefer a descriptive name with a standalone test or integration into an existing one?

Comment on lines 6999 to 7013
if ((next_weight|next_not_pref) < (victim_weight|victim_not_pref))
if (UNIV_UNLIKELY(!victim) ||
(next_weight|next_not_pref) < (victim_weight|victim_not_pref))
{
victim_weight= next_weight;
victim= next;
victim_pos= l;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t fully understand this. We would seem to have victim=nullptr only during the first iteration. In the general case, the path to the deadlock could be like the letter P, comprising an acyclic sequence of M steps and a cycle of N steps. If I understood it correctly, this code assumes M=1. The test case is for N=2; in a more general case we would have a deadlock among several transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally here the victim is always initialized to the first element returned by find_cycle:

https://github.com/MariaDB/server/pull/4679/changes#diff-99bcdacef9b359647344e5bf7fea6a5fc32847ac35d2ff9481e314a74abb6509L6970

But because we shifted our starting point to zero and victim to nullptr we need to populate the victim = next (line 7002/7011) and related weight and pos to the first available value unconditionally if the victim has not yet been initialized.

…nsaction number

The "WE ROLL BACK TRANSACTION (N)" message in the deadlock report
referred to the wrong transaction number. The victim selection loop
and the display loop in `Deadlock::report()` traversed the cycle in
the same order (`cycle->wait_trx, ..., cycle`) but used misaligned
position numbering:

- **Victim selection** initialized `victim = cycle` at position 1
  *before* the loop, then started iterating from `cycle->wait_trx`
  at position 2.
- **Display loop** started from `cycle->wait_trx` at label `(1)`,
  with `cycle` displayed last at label `(N)`.

This caused `victim_pos` to be off by one relative to the displayed
transaction labels.

Fix: restructure the victim selection loop to start with `l=0` and
`victim=nullptr`, letting the loop handle all transactions uniformly.
The first iteration unconditionally picks `cycle->wait_trx` as the
initial victim at position 1, matching the display loop. The
`thd_deadlock_victim_preference()` call is guarded with a
`victim != nullptr` check to skip it on the first iteration (where
no prior victim exists to compare against).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

3 participants