1185.1.29
by Robert Collins
merge merge tweaks from aaron, which includes latest .dev |
1 |
Codeville |
2 |
********* |
|
3 |
||
4 |
Documentation on how this actually works is pretty scarce to say the |
|
5 |
least. |
|
6 |
||
7 |
I *think* I understand their merge algorithm though, and it's pretty |
|
8 |
clever. Basically we do a two-way merge between annotated forms of |
|
9 |
the two files: that is, with each line marked with the revision in |
|
10 |
which it last changed. (I am simplifying here by speaking of lines |
|
11 |
and changes, but I don't think it causes any essential problem.) |
|
12 |
||
13 |
Now we walk through each file, line by line. If the change that |
|
14 |
introduced the line state in branch A is already merged into branch B, |
|
15 |
then we can just take B. |
|
16 |
||
17 |
Now is this actually better? |
|
18 |
||
19 |
It may be better in several ways: |
|
20 |
||
21 |
* Do not need to choose just a single ancestor, but rather can |
|
22 |
take advantage of all possible previous changes. |
|
23 |
||
24 |
* Can handle OTHER containing changes which have been merged into |
|
25 |
THIS, but have then been overwritten. |
|
26 |
||
27 |
* Can handle cherrypicks(!) by remembering which lines came in from |
|
28 |
that cherrypick; then we don't need to merge them again. |
|
29 |
||
30 |
Some questions: |
|
31 |
||
32 |
* Do we actually need to store the annotations, or can we just infer |
|
33 |
it at the time we do the merge? |
|
34 |
||
35 |
* Can this be accomodated in something like an SCCS weave format? I |
|
36 |
think something like a weave may work, in as much as it is basically |
|
37 |
a concatenation of annotations, but I don't know if it represents |
|
38 |
merges well enough to cope. |
|
39 |
||
40 |
Can this handle binaries or type-specific merges, and if so how? |
|
41 |
Unmergeable binaries are easy: just get the user to pick one. Things |
|
42 |
like XML are harder; we probably need to punt out to a type-specific |
|
43 |
three-way merge. Of course this approach does not forbid also |
|
44 |
offering a 3-way merge. |
|
45 |
||
46 |
---- |
|
47 |
||
48 |
I suppose this could be accomodated by an annotation cache on top of |
|
49 |
plain history storage, or by using a storage format such as a weave |
|
50 |
that can efficiently produce annotation information. |
|
51 |
||
52 |
That is to say there is nothing inherently necessary about remembering |
|
53 |
the line history at the point when it is committed, except that it |
|
54 |
might be more efficient to do this once and remember it than to |
|
55 |
||
56 |
---- |
|
57 |
||
58 |
There is another interesting approach that can be used even in a tool |
|
59 |
that does not inherently remember annotations: |
|
60 |
||
61 |
Given two files to merge, find all regions of difference. For each |
|
62 |
such, try to find a common ancestor having the same content for the |
|
63 |
region. Subdivide the region if necessary. |
|
64 |
||
65 |
This naive approach is probably infeasible, since it would mean |
|
66 |
checking every possible predecessor. |
|
67 |
||
68 |
---- |
|
69 |
||
70 |
Rather than storing or calculating annotations, we could try using a |
|
71 |
complex weave, which allows one file version to be represented as a |
|
72 |
weave of multiple disjoint previous versions. It sounds complex but |
|
73 |
it might work. |
|
74 |
||
75 |
Essentially we store each file as a selection of lines that should be |
|
76 |
turned on in that file. These files might come from any of the |
|
77 |
predecessors that were merged into that file. Complex to get right |
|
78 |
but it might work. |
|
79 |
||
80 |
This is written in terms of lines, but it might make more sense to |
|
81 |
just use byte ranges: perhaps more efficient when handling long files, |
|
82 |
and makes binaries less of a special case. |
|
83 |
||
84 |
codeville in fact does *not* seem to do this, though to me it seems |
|
85 |
like a fairly natural corollary of their design. |
|
86 |
||
87 |
This seems to imply holding the file text and ancestry of every branch |
|
88 |
that ever merged into this one, rather than just finding them if we |
|
89 |
later want them. Hm. That is nice in terms of doing smart merges. |
|
90 |
That possibly causes trouble in terms of having a name for these |
|
91 |
branches floating around inside our space, and making sure we don't |
|
92 |
clash with them. It may make sense in terms of having a working |
|
93 |
directory be just a view into a shared database, looking at a |
|
94 |
particular line of development. |
|
95 |
||
96 |
Indeed the main difficulty seems to be of naming branches in this |
|
97 |
space. Perhaps we should move back to using repositories and named |
|
98 |
branches within them, but not rely on branch names being unique out of |
|
99 |
the context of a single repository. |
|
100 |
||
101 |
Wow, this seems to open a big can of worms. |
|
102 |
||
103 |
---- |
|
104 |
||
105 |
So the conclusion is that this is very cool, but it does not require a |
|
106 |
fundamental change of model and can be implemented later. |