forked from git/git
-
Notifications
You must be signed in to change notification settings - Fork 164
merge-ort: fix corner case recursive submodule/directory conflict handling #2023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
newren
wants to merge
1
commit into
gitgitgadget:master
Choose a base branch
from
newren:fix-submodule-merge
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+120
−1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…dling
At GitHub, a few repositories were triggering errors of the form:
git: merge-ort.c:3037: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
Aborted (core dumped)
While these may look similar to both
a562d90 (merge-ort: fix failing merges in special corner case,
2025-11-03)
and
f6ecb60 (merge-ort: fix directory rename on top of source of other
rename/delete, 2025-08-06)
the cause is different and in this case the problem is not an
over-conservative assertion, but a bug before the assertion where we did
not update all relevant state appropriately.
It sadly took me a really long time to figure out how to get a simple
reproducer for this one. It doesn't really have that many moving parts,
but there are multiple pieces of background information needed to
understand it.
First of all, when we have two files added at the same path, merge-ort
does a two-way merge of those files. If we have two directories added
at the same path, we basically do the same thing (taking the union of
files, and two-way merging files with the same name). But two-way
merging requires components of the same type. We can't merge the
contents of a regular file with a directory, or with a symlink, or with
a submodule. Nor can any of those other types be merged with each
other, e.g. merging a submodule with a directory is a bad idea. When
two paths have the same name but their types do not match, merge-ort is
forced to move one of them to an alternate filename (using the
unique_path() function).
Second, if two commits being merged have more than one merge-base,
merge-ort will merge the merge-bases to create a virtual merge-base, and
use that as the base commit.
Third, one of the really important optimizations in merge-ort is trivial
tree-level resolution (roughly meaning merging trees without recursing
into them). This optimization has some nuance to it that is important
to the current bug, and to understand it, it helps to first look at the
high-level overview of how merge-ort runs; there are basically three
high-level functions that the work is divided between:
collect_merge_info() - walks the top-level trees getting individual
paths of interest
detect_renames() - detect renames between paths in order to match up
paths for three-way merging
process_entries() - does a few things of interest:
* three-way merging of files,
* other special handling (e.g. adjusting paths with conflicting
types to avoid path collisions)
* as it finishes handling all the files within a subdirectory,
writes out a new tree object for that directory
If it were not for renames, we could just always do tree-level merging
whenever the tree on at least one side was unmodified. Unfortunately,
we need to recurse into trees to determine whether there are renames.
However, we can also do tree-level merging so long as there aren't any
*relevant* renames (another merge-ort optimization), which we can
determine without recursing into trees.
We would also be able to do tree-level merging if we somehow apriori
knew what renames existed, by only recursing into the trees which we
could otherwise trivially merge if they contained files involved in
renames. That might not seem useful, because we need to find out the
renames and we have to recurse into trees to do so, but when you find
out that the process_entries() step is more computationally expensive
than the collect_merge_info() step, it yields an interesting strategy:
* run collect_merge_info()
* run detect_renames()
* cache the renames()
* restart -- rerun collect_merge_info(), using the cached renames to
only recurse into the needed trees
* we already have the renames cached so no need to re-detect
* run process_entries() on the reduced list of paths
which was implemented back in 7bee6c1 (merge-ort: avoid recursing
into directories when we don't need to, 2021-07-16) Crucially, this
restarting only occurs if the number of paths we could skip recursing
into exceeds the number we still need to recurse into by some safety
factor (wanted_factor in handle_deferred_entries()); forgetting this
fact is a great way to repeatedly fail to create a minimal testcase for
several days and go down alternate wrong paths).
Now, I earlier summarized this optimization as "merging trees without
recursing into them", but this optimization does not require that all
three sides of history has a directory at a given path. So long as the
tree on one side matches the tree in the base version, we can decide to
resolve in favor of whatever the other side of history has at that path
-- be it a directory, a file, a submodule, or a symlink. Unfortunately,
the code in question didn't fully realize this, and was written assuming
the base version and both sides would have a directory at the given
path, as can be seen by the "ci->filemask == 0" comment in
resolve_trivial_directory_merge() that was added as part of 7bee6c1
(merge-ort: avoid recursing into directories when we don't need to,
2021-07-16). A few additional lines of code are needed to handle cases
where we have something other than a directory on the other side of
history.
But, knowing that resolve_trivial_directory_merge() doesn't have
sufficient state updating logic doesn't show us how to trigger a bug
without combining with the other bits of information we provided above.
Here's a relevant testcase:
* branches A & B
* commit A1: adds "folder" as a directory with files tracked under it
* commit B1: adds "folder" as a submodule
* commit A2: merges B1 into A1, keeping "folder" as a directory
(and in fact, with no changes to "folder" since A1), discarding the
submodule
* commit B2: merges A1 into B1, keeping "folder" as a submodule
(and in fact, with no changes to "folder" since B1), discarding the
directory
Here, if we try to merge A2 & B2, the logic proceeds as follows:
* we have multiple merge-bases: A1 & B1. So we have to merge those
to get a virtual merge base.
* due to "folder" as a directory and "folder" as a submodule, the
path collision logic triggers and renames "folder" as a submodule
to "folder~Temporary merge branch 2" so we can keep it alongside
"folder" as a directory.
* we now have a virtual merge base (containing both "folder"
directory and a "folder~Temporary merge branch 2" submodule) and
can now do the outer merge
* in the first step of the outer merge, we attempt to defer recursing
into folder/ as a directory, but find we need to for rename
detection.
* in rename detection, we note that "folder~Temporary merge branch 2"
has the same hash as "folder" as a submodule in B2, which means we
have an exact rename.
* after rename detection, we discover no path in folder/ is needed
for renames, and so we can cache renames and restart.
* after restarting, we avoid recursing into "folder/" and realize we
can resolve it trivially since it hasn't been modified. The
resolution removes "folder/", leaving us only "folder" as a
submodule from commit B2.
* After this point, we should have a rename/delete conflict on
"folder~Temporary merge branch 2" -> "folder", but our marking of
the merge of "folder" as clean broke our ability to handle that and
in fact triggers an assertion in process_renames().
When there was a df_conflict (directory/"file" conflict, where "file"
could be submodule or regular file or symlink), ensure
resolve_trivial_directory_merge() handles it properly. In particular:
* do not pre-emptively mark the path as cleanly merged if the
remaining path is a file; allow it to be processed in
process_entries() later to determine if it was clean
* clear the parts of dirmask or filemask corresponding to the matching
sides of history, since we are resolving those away
* clear the df_conflict bit afterwards; since we cleared away the two
matching sides and only have one side left, that one side can't
have a directory/file conflict with itself.
Also add the above minimal testcase showcasing this bug to t6422, **with
a sufficient number of paths under the folder/ directory to actually
trigger it**. (I wish I could have all those days back from all the
wrong paths I went down due to not having enough files under that
directory...)
I know this commit has a very high ratio of lines in the commit message
to lines of comments, and a relatively high ratio of comments to actual
code, but given how long it took me to track down, on the off chance
that we ever need to further modify this logic, I wanted it thoroughly
documented for future me and for whatever other poor soul might end up
needing to read this commit message.
Signed-off-by: Elijah Newren <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This bug dates back to ort's introduction, in particular with 7bee6c1 (merge-ort: avoid recursing into directories when we don't need to, 2021-07-16).