3144.3.6
by Aaron Bentley
Add documentation of LCA merge |
1 |
LCA Merge |
2 |
========= |
|
3 |
||
4 |
by Aaron Bentley |
|
5 |
||
6 |
Essential characteristics |
|
7 |
------------------------- |
|
8 |
||
9 |
In the general case (no criss-cross), it is a three-way merge. When |
|
10 |
there is a criss-cross at the tree level, but not for the particular |
|
11 |
file, it is still a three-way merge. When there's a file-level |
|
12 |
criss-cross, it's superior to a three-way merge. |
|
13 |
||
14 |
Algorithm |
|
15 |
--------- |
|
16 |
||
17 |
First, we compare the files we are trying to merge, and find the lines |
|
18 |
that differ. Next, we try to determine why they differ; this is |
|
19 |
essential to the merge operation, because it affects how we resolve the |
|
20 |
differences. In this merger, there are three possible outcomes: |
|
21 |
||
22 |
1. The line was added in this version: "new-this" |
|
23 |
2. The line was deleted in the other version: "killed-other" |
|
24 |
3. The line was preserved as part of merge resolution in this version, |
|
25 |
but deleted in the other version: "conflicted-this" |
|
26 |
||
27 |
Option 3 is new, but I believe it is essential. When each side has made |
|
28 |
a conflicting merge resolution, we should let the user decide how to |
|
29 |
combine the two resolutions, i.e. we should emit a conflict. We cannot |
|
30 |
silently drop the line, or silently keep the line, which can happen if |
|
31 |
we choose options 1 or 2. If we choose options 1 or 2, there's also a |
|
32 |
possibility that a conflict will be produced, but no guarantee. We need |
|
33 |
a guarantee, which is why we need a new possible outcome. |
|
34 |
||
35 |
To decide whether a line is "new-this", "killed-other" or |
|
36 |
"conflicted-this", we compare this version against the versions from |
|
37 |
each "least common ancestor" (LCA), in graph terminology. For each LCA |
|
38 |
version, if the line is not present in the LCA version, we add it to the |
|
39 |
"new" set. If the line is present in the LCA version, we add it to the |
|
40 |
"killed" set. |
|
41 |
||
42 |
When we are done going through each LCA version, each unique line will |
|
43 |
be in at least one of the sets. If it is only in the "new" set, it's |
|
44 |
handled as "new-this". If it is only in the "killed" set, it's handled |
|
45 |
as "killed-other". If it is in both sets, it's handled as |
|
46 |
"conflicted-this". |
|
47 |
||
48 |
The logic here is a bit tricky: first, we know that the line is present |
|
49 |
in some, but not all, LCAs. We can assume that all LCAs were produced |
|
50 |
by merges of the same sets of revisions. That means that in those LCAs, |
|
51 |
there were different merge resolutions. Since THIS and OTHER disagree |
|
4031.3.1
by Frank Aspell
Fixing various typos |
52 |
about whether the line is present, those differences have propagated |
3144.3.6
by Aaron Bentley
Add documentation of LCA merge |
53 |
into THIS and OTHER. Therefore, we should declare that the lines are in |
54 |
conflict, and let the user handle the issue. |
|
55 |
||
56 |
LCA merge and Three-way merge |
|
57 |
----------------------------- |
|
58 |
||
59 |
Now, in the common case, there's a single LCA, and LCA merge behaves as |
|
60 |
a three-way merge. Since there's only one LCA, we cannot get the |
|
61 |
"conflicted-this" outcome, only "new-this" or "killed-other. Let's look |
|
62 |
at the typical description of three-way merges: |
|
63 |
||
64 |
+-----+------+-------+------------+ |
|
65 |
|THIS | BASE | OTHER | OUT | |
|
66 |
+-----+------+-------+------------+ |
|
67 |
|A | A | A | A | |
|
68 |
+-----+------+-------+------------+ |
|
69 |
|A | B | A | A | |
|
70 |
+-----+------+-------+------------+ |
|
71 |
|A | B | B | A | |
|
72 |
+-----+------+-------+------------+ |
|
73 |
|A | A | B | B | |
|
74 |
+-----+------+-------+------------+ |
|
75 |
|A | B | C |\*conflict\*| |
|
76 |
+-----+------+-------+------------+ |
|
77 |
||
78 |
Now, let's assume that BASE is a common ancestor, as is typically the |
|
79 |
case. In fact, for best-case merges, BASE is the sole LCA. |
|
80 |
||
81 |
We always pick the version that represents a change from BASE, if there |
|
82 |
is one. For the AAAA line, there is no change, so the output is |
|
83 |
rightfully BASE/THIS/OTHER. For ABAA, the THIS and OTHER are changes |
|
84 |
from BASE, and they are the same change so they both win. (This case is |
|
85 |
sometimes called convergence.) For ABBA, THIS is a change from BASE, so |
|
86 |
THIS wins. For AABB, OTHER is a change from BASE, so OTHER wins. For |
|
87 |
ABC*, THIS and OTHER are both changes to BASE, but they are different |
|
88 |
changes, so they can't both win cleanly. Instead, we have a conflict. |
|
89 |
||
90 |
Now in three-way merging, we typically talk about regions of text. In |
|
91 |
weave/knit/newness/lca merge, we also have regions. Each contiguous |
|
92 |
group of "unchanged" lines is a region, and the areas between them are |
|
93 |
also regions. |
|
94 |
||
95 |
Let's assign a to THIS and b to OTHER. "unchanged" regions represent |
|
96 |
the AAAA or ABAA cases; it doesn't matter which, because the outcome is |
|
97 |
the same regardless. Regions which consist of only "new-a" or |
|
98 |
"killed-a" represent the ABBA case. Regions which consist of only |
|
99 |
"new-b" or "killed-b" represent the AABB case. Regions which have |
|
100 |
(new-a or killed-a) AND (new-b or killed-b) are the ABC* case-- both |
|
101 |
sides have made changes, and they are different changes, so a conflict |
|
102 |
must be emitted. |
|
103 |
||
104 |
This is what I mean when I say that it is a three-way merge in the |
|
105 |
common case; if there is only one LCA, then it is merely an alternative |
|
106 |
implementation of three-way. (One that happens to automatically do |
|
107 |
``--reprocess``, ftw). |
|
108 |
||
3144.6.1
by Aaron Bentley
Update documentation to reflect conflict-handling difference |
109 |
Exception to three-way behavior |
110 |
------------------------------- |
|
111 |
There is a special case of three-way merge which LCA merge handles differently |
|
112 |
from our default "merge3" algorithm: |
|
113 |
BASE has content X, THIS deletes the content, and OTHER changes X to Y. In |
|
114 |
this case, LCA merge emits Y in its output and does not indicate a conflict. |
|
115 |
merge3 would output Y, but would also indicate a conflict. (This is also the |
|
116 |
behavior in the inverse case where OTHER has nothing and THIS has Y.) |
|
117 |
||
118 |
This behavior is due the way LCA determines basic conflicts; they |
|
119 |
can only be emitted when THIS and OTHER each have unique lines between common |
|
120 |
lines. If THIS does not have unique lines in this position, conflicts will not |
|
121 |
be emitted, even if its (lack of) content is unique. |
|
122 |
||
123 |
This behavior difference is shared with "weave" merge. I hope that a future |
|
124 |
revision of LCA merge will handle this case as merge3 would. |
|
125 |
||
3144.3.6
by Aaron Bentley
Add documentation of LCA merge |
126 |
Why a new name |
127 |
-------------- |
|
128 |
||
129 |
1. It was time. Although knit / annotate merge and newness merge have |
|
130 |
tried to emulate the behavior of the original weave merge algorithm, |
|
131 |
``--merge-type=weave`` hasn't been based on weaves for a long time. |
|
132 |
2. Behavior differences. This algorithm should behave like a three-way |
|
133 |
merge in the common case, while its predecessors did not. It also has |
|
134 |
explicit support for handling conflicting merge resolutions, so it |
|
135 |
should behave better in criss-cross merge scenarios. |
|
136 |
||
137 |
Performance |
|
138 |
----------- |
|
139 |
||
140 |
Unlike the current "weave" merge implementation, lca merge does not |
|
141 |
perform any whole-history operations. LCA selection should scale with |
|
142 |
the number of uncommon revisions. Text comparison time should scale |
|
143 |
mO(n\ :sup:`2`\ ), where m is the number of LCAs, and n is the number of lines |
|
144 |
in the file. The current weave merge compares each uncommon ancestor, |
|
145 |
potentially several times, so it is >= kO(n\ :sup:`2`\ ), where k is the |
|
146 |
number of uncommon ancestors. So "lca" should beat "weave" both in history |
|
147 |
analysis time and in text comparison time. |
|
148 |
||
149 |
Possible flaws |
|
150 |
============== |
|
151 |
||
152 |
1. Inaccurate LCA selection. Our current LCA algorithm uses |
|
153 |
``Graph.heads()``, which is known to be flawed. It may occasionally give |
|
154 |
bad results. This risk is mitigated by the fact that the per-file graphs |
|
155 |
tend to be simpler than the revision graph. And since we're already using |
|
156 |
this LCA algorithm, this is not an additional risk. I hope that John Meinel |
|
157 |
will soon have a fixed version of ``Graph.heads`` for us. |
|
158 |
2. False matches. Weaves have a concept of line identity, but knits and |
|
159 |
later formats do not. So a line may appear to be common to two files, when |
|
160 |
in fact it was introduced separately into each for entirely different |
|
161 |
reasons. This risk is the same for three-way merging. It is mitigated by |
|
162 |
using Patience sequence matching, which a longest-common-subsequence match. |
|
163 |
||
164 |
Acknowledgements |
|
165 |
================ |
|
166 |
||
167 |
I think this could be a great merge algorithm, and a candidate to make |
|
168 |
our default, but this work would not have been possible without the work |
|
169 |
of others, especially: |
|
170 |
||
171 |
- Martin Pool's weave merge and knit/annotate merge algorithms. |
|
172 |
- Bram Cohen's discussions of merge algorithms |
|
173 |
- Andrew Tridgell's dissection of BitKeeper merge |
|
174 |
- Nathaniel Smith's analysis of why criss-cross histories necessarily |
|
175 |
produce poor three-way merges. |