~bzr-pqm/bzr/bzr.dev

2520.4.77 by Aaron Bentley
Describe bundles and merge directives
1
============================================
2
Merge Directive format 2 and Bundle format 4
3
============================================
2520.4.66 by Aaron Bentley
Update bundle description
4
5
:Date: 2007-06-21
6
7
Motivation
8
----------
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
9
Merge Directive format 2 represents a request to perform a certain merge.  It
10
provides access to all the data necessary to perform that merge, by including
11
a branch URL or a bundle payload.  It typically will include a preview of
12
what applying the patch would do.
13
14
Bundle Format 4 is designed to be a compact format for storing revision
15
metadata that can be generated quickly and installed into a repository
16
efficiently.  It is not intended to be human-readable.
17
18
Note
19
----
20
These two formats, taken together, can be viewed as the successor of Bundle
21
format 0.9, so their specifications are combined.  It is expected that in the
22
future, bundle and merge-directive formats will vary independently.
23
24
25
Bundle Format Name
26
------------------
27
This is the fourth bundle format to see public use.  Previous versions were
28
0.7, 0.8, and 0.9.  Only 0.7's version number was aligned with a Bazaar
29
release.
2520.4.65 by Aaron Bentley
bundle-format4.txt
30
2520.4.66 by Aaron Bentley
Update bundle description
31
2520.4.65 by Aaron Bentley
bundle-format4.txt
32
Dependencies
2520.4.66 by Aaron Bentley
Update bundle description
33
------------
2520.4.65 by Aaron Bentley
bundle-format4.txt
34
- Container format 1
35
- Multiparent diffs
36
- Bencode
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
37
- Patch-RIO
2520.4.65 by Aaron Bentley
bundle-format4.txt
38
2520.4.66 by Aaron Bentley
Update bundle description
39
2520.4.65 by Aaron Bentley
bundle-format4.txt
40
Description
2520.4.66 by Aaron Bentley
Update bundle description
41
-----------
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
42
Merge Directives fulfil the role previous bundle formats had of requesting a
43
merge to be performed, but are a more flexible way of doing so.  With the
44
introduction of these two formats, there is a clear split between "directive",
45
which is a request to merge (and therefore signable), and "bundle", which is
46
just data.
47
48
Merge Directive format 2 may provide a patch preview of the change being
49
requested.  If a preview is supplied, the receiving client will verify that
50
the actual change matches the preview.
51
52
Merge Directive format 2 also includes a testament hash, to ensure that if a
53
branch is used, the branch cannot be subverted to cause the wrong changes to be
54
applied.
55
56
Bundle format 4 is designed to trade human-readability for speed and
57
compactness.  It does not contain a human-readable "prelude" patch.
58
59
Merge Directive 2 Contents
60
--------------------------
61
This format consists of three sections, in the following order.
62
63
64
Patch-RIO command section
65
~~~~~~~~~~~~~~~~~~~~~~~~~
66
This section is identical to the corresponding section in Format 1 merge
67
directives, except as noted below.  It is mandatory.  It is terminated by a
68
line reading ``#`` that is not preceeded by a line ending with ``\``.
69
2520.4.94 by Aaron Bentley
Update bundle doc
70
In order to support cherry-picking and patch comparison, this format adds a new
71
piece of information, the ``base_revision_id``.  This is a suggested base
72
revision for merging.  It may be supplied by the user.  If not, it is
73
calculated using the standard merge base algorithm, with the ``revision_id``
74
and target branch's ``last_revision`` as its inputs.
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
75
76
When merging, clients should use the ``base_revision_id`` when it is not
77
already present in the ancestry of the ``last_revision`` of the target branch.
78
If it is already present, clients should calculate a merge base in the normal
79
way.
80
81
82
Patch preview section
83
~~~~~~~~~~~~~~~~~~~~~
84
This section is optional.  It begins with the line ``# Begin patch``.  It is
85
terminated by the end-of-file or by the beginning of a bundle section.
86
87
Its contents are a unified diff, as per the ``bzr diff`` command.  The FROM
88
revision is the ``base_revision_id`` specified in the Patch-RIO section.
89
90
91
Bundle section
92
~~~~~~~~~~~~~~
93
This section is optional, but if it is not supplied, a source_branch must be
94
supplied.  It begins with the line ``# Begin bundle``, and is terminated by the
95
end-of-file.
96
97
The contents are a base-64 encoded bundle.  This may be any bundle format, but
98
formats 4+ are strongly recommended.  The base revision is the newest revision
99
in the source branch which is an ancestor of all revisions not present in
100
target which are ancestors of revision_id.
101
102
This base revision may or may not be the same as the ``base_revision_id``.  In
103
particular, the ``base_revision_id`` may specify a cherry-pick, but all the
104
ancestors of the ``base_revision_id`` should be installed in the target
105
repository before performing such a merge.
106
107
108
Bundle 4 Contents
109
-----------------
110
Bazaar revision bundles  begin with a format marker that reads
111
``# Bazaar revision bundle v4`` in plaintext.  The remainder of the file is a
112
``Bazaar pack format 1`` container.  The container is compressed using bzip2.
113
114
Putting the format marker in plaintext ensures that old clients will give good
115
diagnostics, but renders the file unreadable by standard bzip2 utilities.
2520.4.65 by Aaron Bentley
bundle-format4.txt
116
117
Serialization
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
118
~~~~~~~~~~~~~
2520.4.66 by Aaron Bentley
Update bundle description
119
Format 4 records revision and inventory records in their repository
120
serialization format.  This minimizes translation and compression costs
121
in the common case, where the sender and receiver use the same serialization
122
format for their repository. Steps have been taken to ensure a faithful
123
conversion when serialization formats are mismatched.
124
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
125
126
Bundle Records
127
~~~~~~~~~~~~~~
128
The bundle format creates a single bundle-level record out of two container
129
records.  The first container record contains metainfo as a Bencoded dict.  The
130
second container record contains the body.
131
132
The bundle record name is associated with the metainfo record.  The body record
133
is anonymous.
134
135
2520.4.69 by Aaron Bentley
Simplify encoding by storing bodies in anonymous records
136
Record metainfo
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
137
~~~~~~~~~~~~~~~
2520.4.69 by Aaron Bentley
Simplify encoding by storing bodies in anonymous records
138
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
139
:record_kind: The storage strategy of the record.  May be ``fulltext`` (the
140
    record body contains the full text of the value), ``mpdiff`` (the record
141
    body contains a multi-parent diff of the value), or ``header`` (no record
142
    body).
2520.4.69 by Aaron Bentley
Simplify encoding by storing bodies in anonymous records
143
:parents: Used in fulltext and mpdiff records.  The revisions that should be
144
    noted as parents of this revision in the repository.  For mpdiffs, this is
145
    also the list of build-parents.
146
:sha1: Used in mpdiff records.  The sha-1 hash of the full-text value.
147
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
148
2520.4.69 by Aaron Bentley
Simplify encoding by storing bodies in anonymous records
149
Bundle record naming
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
150
~~~~~~~~~~~~~~~~~~~~~
151
All bundle records have a single name, which is associated with the metainfo
152
container record.  Records are named according to the body's content-kind,
153
revision-id, and file-id.
2520.4.66 by Aaron Bentley
Update bundle description
154
155
Content-kind may be one of:
156
157
:file: a version of a user file
158
:inventory: the tree inventory
159
:revision: the revision metadata for a revision
160
:signature: the revision signature for a revision
161
2520.4.127 by Aaron Bentley
Fix up name encoding to handle revision-ids with slashes
162
Names are constructed like so: ``content-kind/revision-id/file-id``.  Values
163
are iterpreted left-to-right, so if two values are present, they are
164
content-kind and revision-id.
2520.4.66 by Aaron Bentley
Update bundle description
165
A record has a file-id if-and-only-if it is a file record.
2520.4.127 by Aaron Bentley
Fix up name encoding to handle revision-ids with slashes
166
Info records have no revision or file-id.
167
Inventory, revision and signature all have content-kind and revision-id, but
168
no file-id.
2520.4.66 by Aaron Bentley
Update bundle description
169
170
Layout
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
171
~~~~~~
2520.4.66 by Aaron Bentley
Update bundle description
172
The first record is an info/header record.
173
174
The subsequent records are mpdiff file records.  The are ordered first by file
175
id, then in topological order by revision-id.
176
177
The next records are mpdiff inventory records.  They are topologically sorted.
178
179
The next records are revision and signature fulltexts.  They are interleaved
180
and topologically sorted.
181
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
182
Info record
183
~~~~~~~~~~~
184
The info record has type ``header``.  It has no revision_id or file_id.
185
Its metadata contains:
186
187
:serializer: A string describing the serialization format used for inventory
188
    and revision data.  May be ``xml5``, ``xml6`` or ``xml7``.
189
:supports_rich_root: 1 if the source repository supports rich roots,
190
    0 otherwise.
191
192
2520.4.66 by Aaron Bentley
Update bundle description
193
Implementation notes
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
194
~~~~~~~~~~~~~~~~~~~~
2520.4.66 by Aaron Bentley
Update bundle description
195
- knit deltas contain almost enough information to extract the original
196
  SequenceMatcher.get_matching_blocks() call used to produce them.  Combining
197
  that information with the relevant fulltexts allows us to avoid performing
198
  sequence matching on any fulltexts for which we have deltas.
199
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
200
- MultiParent deltas contain ``get_matching_blocks`` output almost verbatim,
201
  but if there is more than one parent, the information about the leftmost
202
  parent may be incomplete.  However, for single-parent multiparent diffs, we
203
  can extract the ``SequenceMatcher.get_matching_blocks`` output, and therefore
204
  ``the SequenceMatcher.get_opcodes`` output used to create knit deltas.
205
2520.4.66 by Aaron Bentley
Update bundle description
206
207
Installing data across serialization mismatches
208
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209
In practice, there cannot be revision serialization mismatches, because the
210
serialization of revisions has been consistent in serializations 5-7
211
212
If there is a mismatch in inventory serialization formats, the receiver can
213
214
  1. extract the inventory objects for the parents
215
  2. serialize them using the bundle serialize
216
  3. apply the mpdiff
217
  4. calculate the fulltext sha1
218
  5. compare the calculated sha1 to the expected sha1
219
  6. deserialize using the bundle serializer
220
  7. serialize using the repository serializer
221
  8. add to the repository
222
223
This is much slower, of course.  But since the since the fulltext is verified
224
at step 5, it should be just as safe as any other conversion.
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
225
226
Model differences
227
~~~~~~~~~~~~~~~~~
228
229
Note that there may be model differences requiring additional changes.  These
2520.4.129 by Aaron Bentley
update docs
230
differences are described by the "supports_rich_root" value in the info record.
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
231
232
A subset of xml6 and xml7 records are compatible with xml5 (i.e. those that
233
were converted from xml5 originally).
234
2520.4.129 by Aaron Bentley
update docs
235
When installing from a bundle whose serializer supports tree references to a
236
repository that does not support tree references, clients should halt if they
237
encounter a record containing a tree reference.
2520.4.77 by Aaron Bentley
Describe bundles and merge directives
238
239
When installing from a supports_rich_root bundle to a repository that does not
240
support rich roots, clients should halt if they encounter an inventory record
241
whose root directory revision-id does not match the inventory revision id.
242
243
When installing from a bundle that does not support rich roots to a repository
244
that does, additional knits should be added for the root directory, with a
245
revision for each inventory revision.
2520.4.94 by Aaron Bentley
Update bundle doc
246
247
Validating preview patches
248
~~~~~~~~~~~~~~~~~~~~~~~~~~
249
When applying a merge directive that includes a preview, clients should
250
verify that the preview matches the changes requested by the merge directive.
251
252
In order to do this, the client should generate a diff from the
253
``base_revision_id`` to the ``revision_id``.  This diff should be compared
254
against the preview patch, making allowances for the fact that whitespace
255
munging may have occurred.
256
257
One form of whitespace munging that has been observed is line-ending
258
conversion.  Certain mail clients such as Evolution do not respect the
259
line-endings of text attachments.  Since line-ending conversion is unlikely to
260
alter the meaning of a patch, it seems safe to ignore line endings when
261
comparing the preview patch.
262
263
Another form of whitespace munging that has been observed is
264
trailing-whitespace stripping.  Again, it seems unlikely that stripping
265
trailing whitespace could alter the meaning of a patch.  Such a distinction
266
is also invisible to readers, so ignoring it does not create a new threat.  So
267
it seems reasonable to ignore trailing whitespace when comparing the patches.
268
269
Other mungings are possible, but it is recommended not to implement support
270
for them until they have been observed.  Each of these changes makes the
271
comparison more approximate, and the more approximate it becomes, the easier it
272
is to provide a preview patch that does not match the requested changes.