2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
1 |
============================================ |
2 |
Merge Directive format 2 and Bundle format 4 |
|
3 |
============================================ |
|
2520.4.66
by Aaron Bentley
Update bundle description |
4 |
|
5 |
:Date: 2007-06-21 |
|
6 |
||
7 |
Motivation |
|
8 |
---------- |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
9 |
Merge Directive format 2 represents a request to perform a certain merge. It |
10 |
provides access to all the data necessary to perform that merge, by including |
|
11 |
a branch URL or a bundle payload. It typically will include a preview of |
|
12 |
what applying the patch would do. |
|
13 |
||
14 |
Bundle Format 4 is designed to be a compact format for storing revision |
|
15 |
metadata that can be generated quickly and installed into a repository |
|
16 |
efficiently. It is not intended to be human-readable. |
|
17 |
||
18 |
Note |
|
19 |
---- |
|
20 |
These two formats, taken together, can be viewed as the successor of Bundle |
|
21 |
format 0.9, so their specifications are combined. It is expected that in the |
|
22 |
future, bundle and merge-directive formats will vary independently. |
|
23 |
||
24 |
||
25 |
Bundle Format Name |
|
26 |
------------------ |
|
27 |
This is the fourth bundle format to see public use. Previous versions were |
|
28 |
0.7, 0.8, and 0.9. Only 0.7's version number was aligned with a Bazaar |
|
29 |
release. |
|
2520.4.65
by Aaron Bentley
bundle-format4.txt |
30 |
|
2520.4.66
by Aaron Bentley
Update bundle description |
31 |
|
2520.4.65
by Aaron Bentley
bundle-format4.txt |
32 |
Dependencies |
2520.4.66
by Aaron Bentley
Update bundle description |
33 |
------------ |
2520.4.65
by Aaron Bentley
bundle-format4.txt |
34 |
- Container format 1 |
35 |
- Multiparent diffs |
|
36 |
- Bencode |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
37 |
- Patch-RIO |
2520.4.65
by Aaron Bentley
bundle-format4.txt |
38 |
|
2520.4.66
by Aaron Bentley
Update bundle description |
39 |
|
2520.4.65
by Aaron Bentley
bundle-format4.txt |
40 |
Description |
2520.4.66
by Aaron Bentley
Update bundle description |
41 |
----------- |
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
42 |
Merge Directives fulfil the role previous bundle formats had of requesting a |
43 |
merge to be performed, but are a more flexible way of doing so. With the |
|
44 |
introduction of these two formats, there is a clear split between "directive", |
|
45 |
which is a request to merge (and therefore signable), and "bundle", which is |
|
46 |
just data. |
|
47 |
||
48 |
Merge Directive format 2 may provide a patch preview of the change being |
|
49 |
requested. If a preview is supplied, the receiving client will verify that |
|
50 |
the actual change matches the preview. |
|
51 |
||
52 |
Merge Directive format 2 also includes a testament hash, to ensure that if a |
|
53 |
branch is used, the branch cannot be subverted to cause the wrong changes to be |
|
54 |
applied. |
|
55 |
||
56 |
Bundle format 4 is designed to trade human-readability for speed and |
|
57 |
compactness. It does not contain a human-readable "prelude" patch. |
|
58 |
||
59 |
Merge Directive 2 Contents |
|
60 |
-------------------------- |
|
61 |
This format consists of three sections, in the following order. |
|
62 |
||
63 |
||
64 |
Patch-RIO command section |
|
65 |
~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
66 |
This section is identical to the corresponding section in Format 1 merge |
|
67 |
directives, except as noted below. It is mandatory. It is terminated by a |
|
68 |
line reading ``#`` that is not preceeded by a line ending with ``\``. |
|
69 |
||
2520.4.94
by Aaron Bentley
Update bundle doc |
70 |
In order to support cherry-picking and patch comparison, this format adds a new |
71 |
piece of information, the ``base_revision_id``. This is a suggested base |
|
72 |
revision for merging. It may be supplied by the user. If not, it is |
|
73 |
calculated using the standard merge base algorithm, with the ``revision_id`` |
|
74 |
and target branch's ``last_revision`` as its inputs. |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
75 |
|
76 |
When merging, clients should use the ``base_revision_id`` when it is not |
|
77 |
already present in the ancestry of the ``last_revision`` of the target branch. |
|
78 |
If it is already present, clients should calculate a merge base in the normal |
|
79 |
way. |
|
80 |
||
81 |
||
82 |
Patch preview section |
|
83 |
~~~~~~~~~~~~~~~~~~~~~ |
|
84 |
This section is optional. It begins with the line ``# Begin patch``. It is |
|
85 |
terminated by the end-of-file or by the beginning of a bundle section. |
|
86 |
||
87 |
Its contents are a unified diff, as per the ``bzr diff`` command. The FROM |
|
88 |
revision is the ``base_revision_id`` specified in the Patch-RIO section. |
|
89 |
||
90 |
||
91 |
Bundle section |
|
92 |
~~~~~~~~~~~~~~ |
|
93 |
This section is optional, but if it is not supplied, a source_branch must be |
|
94 |
supplied. It begins with the line ``# Begin bundle``, and is terminated by the |
|
95 |
end-of-file. |
|
96 |
||
97 |
The contents are a base-64 encoded bundle. This may be any bundle format, but |
|
98 |
formats 4+ are strongly recommended. The base revision is the newest revision |
|
99 |
in the source branch which is an ancestor of all revisions not present in |
|
100 |
target which are ancestors of revision_id. |
|
101 |
||
102 |
This base revision may or may not be the same as the ``base_revision_id``. In |
|
103 |
particular, the ``base_revision_id`` may specify a cherry-pick, but all the |
|
104 |
ancestors of the ``base_revision_id`` should be installed in the target |
|
105 |
repository before performing such a merge. |
|
106 |
||
107 |
||
108 |
Bundle 4 Contents |
|
109 |
----------------- |
|
110 |
Bazaar revision bundles begin with a format marker that reads |
|
111 |
``# Bazaar revision bundle v4`` in plaintext. The remainder of the file is a |
|
112 |
``Bazaar pack format 1`` container. The container is compressed using bzip2. |
|
113 |
||
114 |
Putting the format marker in plaintext ensures that old clients will give good |
|
115 |
diagnostics, but renders the file unreadable by standard bzip2 utilities. |
|
2520.4.65
by Aaron Bentley
bundle-format4.txt |
116 |
|
117 |
Serialization |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
118 |
~~~~~~~~~~~~~ |
2520.4.66
by Aaron Bentley
Update bundle description |
119 |
Format 4 records revision and inventory records in their repository |
120 |
serialization format. This minimizes translation and compression costs |
|
121 |
in the common case, where the sender and receiver use the same serialization |
|
122 |
format for their repository. Steps have been taken to ensure a faithful |
|
123 |
conversion when serialization formats are mismatched. |
|
124 |
||
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
125 |
|
126 |
Bundle Records |
|
127 |
~~~~~~~~~~~~~~ |
|
128 |
The bundle format creates a single bundle-level record out of two container |
|
129 |
records. The first container record contains metainfo as a Bencoded dict. The |
|
130 |
second container record contains the body. |
|
131 |
||
132 |
The bundle record name is associated with the metainfo record. The body record |
|
133 |
is anonymous. |
|
134 |
||
135 |
||
2520.4.69
by Aaron Bentley
Simplify encoding by storing bodies in anonymous records |
136 |
Record metainfo |
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
137 |
~~~~~~~~~~~~~~~ |
2520.4.69
by Aaron Bentley
Simplify encoding by storing bodies in anonymous records |
138 |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
139 |
:record_kind: The storage strategy of the record. May be ``fulltext`` (the |
140 |
record body contains the full text of the value), ``mpdiff`` (the record |
|
141 |
body contains a multi-parent diff of the value), or ``header`` (no record |
|
142 |
body). |
|
2520.4.69
by Aaron Bentley
Simplify encoding by storing bodies in anonymous records |
143 |
:parents: Used in fulltext and mpdiff records. The revisions that should be |
144 |
noted as parents of this revision in the repository. For mpdiffs, this is |
|
145 |
also the list of build-parents. |
|
146 |
:sha1: Used in mpdiff records. The sha-1 hash of the full-text value. |
|
147 |
||
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
148 |
|
2520.4.69
by Aaron Bentley
Simplify encoding by storing bodies in anonymous records |
149 |
Bundle record naming |
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
150 |
~~~~~~~~~~~~~~~~~~~~~ |
151 |
All bundle records have a single name, which is associated with the metainfo |
|
152 |
container record. Records are named according to the body's content-kind, |
|
153 |
revision-id, and file-id. |
|
2520.4.66
by Aaron Bentley
Update bundle description |
154 |
|
155 |
Content-kind may be one of: |
|
156 |
||
157 |
:file: a version of a user file |
|
158 |
:inventory: the tree inventory |
|
159 |
:revision: the revision metadata for a revision |
|
160 |
:signature: the revision signature for a revision |
|
161 |
||
2520.4.127
by Aaron Bentley
Fix up name encoding to handle revision-ids with slashes |
162 |
Names are constructed like so: ``content-kind/revision-id/file-id``. Values |
163 |
are iterpreted left-to-right, so if two values are present, they are |
|
164 |
content-kind and revision-id. |
|
2520.4.66
by Aaron Bentley
Update bundle description |
165 |
A record has a file-id if-and-only-if it is a file record. |
2520.4.127
by Aaron Bentley
Fix up name encoding to handle revision-ids with slashes |
166 |
Info records have no revision or file-id. |
167 |
Inventory, revision and signature all have content-kind and revision-id, but |
|
168 |
no file-id. |
|
2520.4.66
by Aaron Bentley
Update bundle description |
169 |
|
170 |
Layout |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
171 |
~~~~~~ |
2520.4.66
by Aaron Bentley
Update bundle description |
172 |
The first record is an info/header record. |
173 |
||
174 |
The subsequent records are mpdiff file records. The are ordered first by file |
|
175 |
id, then in topological order by revision-id. |
|
176 |
||
177 |
The next records are mpdiff inventory records. They are topologically sorted. |
|
178 |
||
179 |
The next records are revision and signature fulltexts. They are interleaved |
|
180 |
and topologically sorted. |
|
181 |
||
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
182 |
Info record |
183 |
~~~~~~~~~~~ |
|
184 |
The info record has type ``header``. It has no revision_id or file_id. |
|
185 |
Its metadata contains: |
|
186 |
||
187 |
:serializer: A string describing the serialization format used for inventory |
|
188 |
and revision data. May be ``xml5``, ``xml6`` or ``xml7``. |
|
189 |
:supports_rich_root: 1 if the source repository supports rich roots, |
|
190 |
0 otherwise. |
|
191 |
||
192 |
||
2520.4.66
by Aaron Bentley
Update bundle description |
193 |
Implementation notes |
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
194 |
~~~~~~~~~~~~~~~~~~~~ |
2520.4.66
by Aaron Bentley
Update bundle description |
195 |
- knit deltas contain almost enough information to extract the original |
196 |
SequenceMatcher.get_matching_blocks() call used to produce them. Combining |
|
197 |
that information with the relevant fulltexts allows us to avoid performing |
|
198 |
sequence matching on any fulltexts for which we have deltas. |
|
199 |
||
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
200 |
- MultiParent deltas contain ``get_matching_blocks`` output almost verbatim, |
201 |
but if there is more than one parent, the information about the leftmost |
|
202 |
parent may be incomplete. However, for single-parent multiparent diffs, we |
|
203 |
can extract the ``SequenceMatcher.get_matching_blocks`` output, and therefore |
|
204 |
``the SequenceMatcher.get_opcodes`` output used to create knit deltas. |
|
205 |
||
2520.4.66
by Aaron Bentley
Update bundle description |
206 |
|
207 |
Installing data across serialization mismatches |
|
208 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
209 |
In practice, there cannot be revision serialization mismatches, because the |
|
210 |
serialization of revisions has been consistent in serializations 5-7 |
|
211 |
||
212 |
If there is a mismatch in inventory serialization formats, the receiver can |
|
213 |
||
214 |
1. extract the inventory objects for the parents |
|
215 |
2. serialize them using the bundle serialize |
|
216 |
3. apply the mpdiff |
|
217 |
4. calculate the fulltext sha1 |
|
218 |
5. compare the calculated sha1 to the expected sha1 |
|
219 |
6. deserialize using the bundle serializer |
|
220 |
7. serialize using the repository serializer |
|
221 |
8. add to the repository |
|
222 |
||
223 |
This is much slower, of course. But since the since the fulltext is verified |
|
224 |
at step 5, it should be just as safe as any other conversion. |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
225 |
|
226 |
Model differences |
|
227 |
~~~~~~~~~~~~~~~~~ |
|
228 |
||
229 |
Note that there may be model differences requiring additional changes. These |
|
2520.4.129
by Aaron Bentley
update docs |
230 |
differences are described by the "supports_rich_root" value in the info record. |
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
231 |
|
232 |
A subset of xml6 and xml7 records are compatible with xml5 (i.e. those that |
|
233 |
were converted from xml5 originally). |
|
234 |
||
2520.4.129
by Aaron Bentley
update docs |
235 |
When installing from a bundle whose serializer supports tree references to a |
236 |
repository that does not support tree references, clients should halt if they |
|
237 |
encounter a record containing a tree reference. |
|
2520.4.77
by Aaron Bentley
Describe bundles and merge directives |
238 |
|
239 |
When installing from a supports_rich_root bundle to a repository that does not |
|
240 |
support rich roots, clients should halt if they encounter an inventory record |
|
241 |
whose root directory revision-id does not match the inventory revision id. |
|
242 |
||
243 |
When installing from a bundle that does not support rich roots to a repository |
|
244 |
that does, additional knits should be added for the root directory, with a |
|
245 |
revision for each inventory revision. |
|
2520.4.94
by Aaron Bentley
Update bundle doc |
246 |
|
247 |
Validating preview patches |
|
248 |
~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
249 |
When applying a merge directive that includes a preview, clients should |
|
250 |
verify that the preview matches the changes requested by the merge directive. |
|
251 |
||
252 |
In order to do this, the client should generate a diff from the |
|
253 |
``base_revision_id`` to the ``revision_id``. This diff should be compared |
|
254 |
against the preview patch, making allowances for the fact that whitespace |
|
255 |
munging may have occurred. |
|
256 |
||
257 |
One form of whitespace munging that has been observed is line-ending |
|
258 |
conversion. Certain mail clients such as Evolution do not respect the |
|
259 |
line-endings of text attachments. Since line-ending conversion is unlikely to |
|
260 |
alter the meaning of a patch, it seems safe to ignore line endings when |
|
261 |
comparing the preview patch. |
|
262 |
||
263 |
Another form of whitespace munging that has been observed is |
|
264 |
trailing-whitespace stripping. Again, it seems unlikely that stripping |
|
265 |
trailing whitespace could alter the meaning of a patch. Such a distinction |
|
266 |
is also invisible to readers, so ignoring it does not create a new threat. So |
|
267 |
it seems reasonable to ignore trailing whitespace when comparing the patches. |
|
268 |
||
269 |
Other mungings are possible, but it is recommended not to implement support |
|
270 |
for them until they have been observed. Each of these changes makes the |
|
271 |
comparison more approximate, and the more approximate it becomes, the easier it |
|
272 |
is to provide a preview patch that does not match the requested changes. |