~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/developers/bundle-format4.txt

Committer: Canonical.com Patch Queue Manager
Date: 2007-07-19 16:09:34 UTC
mfrom: (2520.4.135 bzr.mpbundle)
Revision ID: pqm@pqm.ubuntu.com-20070719160934-d51fyijw69oto88p

Add new bundle and merge-directive formats

files added:
bzrlib/bundle/serializer/v4.py

bzrlib/multiparent.py

bzrlib/plugins/multiparent.py

bzrlib/tests/test_multiparent.py

doc/developers/bundle-format4.txt

files removed:
bzrlib/bundle/common.py

bzrlib/bundle/old

bzrlib/bundle/old/send_changeset.py

files renamed:
bzrlib/tests/blackbox/test_bundle.py => bzrlib/tests/blackbox/test_submit.py

files modified:
NEWS

bzrlib/builtins.py

bzrlib/bundle/apply_bundle.py

bzrlib/bundle/bundle_data.py

bzrlib/bundle/commands.py

bzrlib/bundle/serializer/__init__.py

bzrlib/bundle/serializer/v08.py

bzrlib/bundle/serializer/v09.py

bzrlib/errors.py

bzrlib/fetch.py

bzrlib/graph.py

bzrlib/knit.py

bzrlib/merge.py

bzrlib/merge_directive.py

bzrlib/remote.py

bzrlib/repository.py

bzrlib/tests/__init__.py

bzrlib/tests/blackbox/__init__.py

bzrlib/tests/blackbox/test_merge.py

bzrlib/tests/blackbox/test_merge_directive.py

bzrlib/tests/repository_implementations/test_repository.py

bzrlib/tests/test_bundle.py

bzrlib/tests/test_graph.py

bzrlib/tests/test_knit.py

bzrlib/tests/test_merge_directive.py

bzrlib/tests/test_read_bundle.py

bzrlib/tests/test_versionedfile.py

bzrlib/tests/test_xml.py

bzrlib/versionedfile.py

bzrlib/weave.py

bzrlib/xml5.py

bzrlib/xml_serializer.py

doc/developers/bundles.txt

Show diffs side-by-side

added added

removed removed

doc/developers/bundle-format4.txt

============================================

Merge Directive format 2 and Bundle format 4

============================================

:Date: 2007-06-21

Motivation

----------

Merge Directive format 2 represents a request to perform a certain merge. It

provides access to all the data necessary to perform that merge, by including

a branch URL or a bundle payload. It typically will include a preview of

what applying the patch would do.

Bundle Format 4 is designed to be a compact format for storing revision

metadata that can be generated quickly and installed into a repository

efficiently. It is not intended to be human-readable.

Note

----

These two formats, taken together, can be viewed as the successor of Bundle

format 0.9, so their specifications are combined. It is expected that in the

future, bundle and merge-directive formats will vary independently.

Bundle Format Name

------------------

This is the fourth bundle format to see public use. Previous versions were

0.7, 0.8, and 0.9. Only 0.7's version number was aligned with a Bazaar

release.

Dependencies

------------

- Container format 1

- Multiparent diffs

- Bencode

- Patch-RIO

Description

-----------

Merge Directives fulfil the role previous bundle formats had of requesting a

merge to be performed, but are a more flexible way of doing so. With the

introduction of these two formats, there is a clear split between "directive",

which is a request to merge (and therefore signable), and "bundle", which is

just data.

Merge Directive format 2 may provide a patch preview of the change being

requested. If a preview is supplied, the receiving client will verify that

the actual change matches the preview.

Merge Directive format 2 also includes a testament hash, to ensure that if a

branch is used, the branch cannot be subverted to cause the wrong changes to be

applied.

Bundle format 4 is designed to trade human-readability for speed and

compactness. It does not contain a human-readable "prelude" patch.

Merge Directive 2 Contents

--------------------------

This format consists of three sections, in the following order.

Patch-RIO command section

~~~~~~~~~~~~~~~~~~~~~~~~~

This section is identical to the corresponding section in Format 1 merge

directives, except as noted below. It is mandatory. It is terminated by a

line reading ``#`` that is not preceeded by a line ending with ``\``.

In order to support cherry-picking and patch comparison, this format adds a new

piece of information, the ``base_revision_id``. This is a suggested base

revision for merging. It may be supplied by the user. If not, it is

calculated using the standard merge base algorithm, with the ``revision_id``

and target branch's ``last_revision`` as its inputs.

When merging, clients should use the ``base_revision_id`` when it is not

already present in the ancestry of the ``last_revision`` of the target branch.

If it is already present, clients should calculate a merge base in the normal

way.

Patch preview section

~~~~~~~~~~~~~~~~~~~~~

This section is optional. It begins with the line ``# Begin patch``. It is

terminated by the end-of-file or by the beginning of a bundle section.

Its contents are a unified diff, as per the ``bzr diff`` command. The FROM

revision is the ``base_revision_id`` specified in the Patch-RIO section.

Bundle section

~~~~~~~~~~~~~~

This section is optional, but if it is not supplied, a source_branch must be

supplied. It begins with the line ``# Begin bundle``, and is terminated by the

end-of-file.

The contents are a base-64 encoded bundle. This may be any bundle format, but

formats 4+ are strongly recommended. The base revision is the newest revision

in the source branch which is an ancestor of all revisions not present in

100

target which are ancestors of revision_id.

101

102

This base revision may or may not be the same as the ``base_revision_id``. In

103

particular, the ``base_revision_id`` may specify a cherry-pick, but all the

104

ancestors of the ``base_revision_id`` should be installed in the target

105

repository before performing such a merge.

106

107

108

Bundle 4 Contents

109

-----------------

110

Bazaar revision bundles begin with a format marker that reads

111

``# Bazaar revision bundle v4`` in plaintext. The remainder of the file is a

112

``Bazaar pack format 1`` container. The container is compressed using bzip2.

113

114

Putting the format marker in plaintext ensures that old clients will give good

115

diagnostics, but renders the file unreadable by standard bzip2 utilities.

116

117

Serialization

118

~~~~~~~~~~~~~

119

Format 4 records revision and inventory records in their repository

120

serialization format. This minimizes translation and compression costs

121

in the common case, where the sender and receiver use the same serialization

122

format for their repository. Steps have been taken to ensure a faithful

123

conversion when serialization formats are mismatched.

124

125

126

Bundle Records

127

~~~~~~~~~~~~~~

128

The bundle format creates a single bundle-level record out of two container

129

records. The first container record contains metainfo as a Bencoded dict. The

130

second container record contains the body.

131

132

The bundle record name is associated with the metainfo record. The body record

133

is anonymous.

134

135

136

Record metainfo

137

~~~~~~~~~~~~~~~

138

139

:record_kind: The storage strategy of the record. May be ``fulltext`` (the

140

record body contains the full text of the value), ``mpdiff`` (the record

141

body contains a multi-parent diff of the value), or ``header`` (no record

142

body).

143

:parents: Used in fulltext and mpdiff records. The revisions that should be

144

noted as parents of this revision in the repository. For mpdiffs, this is

145

also the list of build-parents.

146

:sha1: Used in mpdiff records. The sha-1 hash of the full-text value.

147

148

149

Bundle record naming

150

~~~~~~~~~~~~~~~~~~~~~

151

All bundle records have a single name, which is associated with the metainfo

152

container record. Records are named according to the body's content-kind,

153

revision-id, and file-id.

154

155

Content-kind may be one of:

156

157

:file: a version of a user file

158

:inventory: the tree inventory

159

:revision: the revision metadata for a revision

160

:signature: the revision signature for a revision

161

162

Names are constructed like so: ``content-kind/revision-id/file-id``. Values

163

are iterpreted left-to-right, so if two values are present, they are

164

content-kind and revision-id.

165

A record has a file-id if-and-only-if it is a file record.

166

Info records have no revision or file-id.

167

Inventory, revision and signature all have content-kind and revision-id, but

168

no file-id.

169

170

Layout

171

~~~~~~

172

The first record is an info/header record.

173

174

The subsequent records are mpdiff file records. The are ordered first by file

175

id, then in topological order by revision-id.

176

177

The next records are mpdiff inventory records. They are topologically sorted.

178

179

The next records are revision and signature fulltexts. They are interleaved

180

and topologically sorted.

181

182

Info record

183

~~~~~~~~~~~

184

The info record has type ``header``. It has no revision_id or file_id.

185

Its metadata contains:

186

187

:serializer: A string describing the serialization format used for inventory

188

and revision data. May be ``xml5``, ``xml6`` or ``xml7``.

189

:supports_rich_root: 1 if the source repository supports rich roots,

190

0 otherwise.

191

192

193

Implementation notes

194

~~~~~~~~~~~~~~~~~~~~

195

- knit deltas contain almost enough information to extract the original

196

SequenceMatcher.get_matching_blocks() call used to produce them. Combining

197

that information with the relevant fulltexts allows us to avoid performing

198

sequence matching on any fulltexts for which we have deltas.

199

200

- MultiParent deltas contain ``get_matching_blocks`` output almost verbatim,

201

but if there is more than one parent, the information about the leftmost

202

parent may be incomplete. However, for single-parent multiparent diffs, we

203

can extract the ``SequenceMatcher.get_matching_blocks`` output, and therefore

204

``the SequenceMatcher.get_opcodes`` output used to create knit deltas.

205

206

207

Installing data across serialization mismatches

208

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

209

In practice, there cannot be revision serialization mismatches, because the

210

serialization of revisions has been consistent in serializations 5-7

211

212

If there is a mismatch in inventory serialization formats, the receiver can

213

214

1. extract the inventory objects for the parents

215

2. serialize them using the bundle serialize

216

3. apply the mpdiff

217

4. calculate the fulltext sha1

218

5. compare the calculated sha1 to the expected sha1

219

6. deserialize using the bundle serializer

220

7. serialize using the repository serializer

221

8. add to the repository

222

223

This is much slower, of course. But since the since the fulltext is verified

224

at step 5, it should be just as safe as any other conversion.

225

226

Model differences

227

~~~~~~~~~~~~~~~~~

228

229

Note that there may be model differences requiring additional changes. These

230

differences are described by the "supports_rich_root" value in the info record.

231

232

A subset of xml6 and xml7 records are compatible with xml5 (i.e. those that

233

were converted from xml5 originally).

234

235

When installing from a bundle whose serializer supports tree references to a

236

repository that does not support tree references, clients should halt if they

237

encounter a record containing a tree reference.

238

239

When installing from a supports_rich_root bundle to a repository that does not

240

support rich roots, clients should halt if they encounter an inventory record

241

whose root directory revision-id does not match the inventory revision id.

242

243

When installing from a bundle that does not support rich roots to a repository

244

that does, additional knits should be added for the root directory, with a

245

revision for each inventory revision.

246

247

Validating preview patches

248

~~~~~~~~~~~~~~~~~~~~~~~~~~

249

When applying a merge directive that includes a preview, clients should

250

verify that the preview matches the changes requested by the merge directive.

251

252

In order to do this, the client should generate a diff from the

253

``base_revision_id`` to the ``revision_id``. This diff should be compared

254

against the preview patch, making allowances for the fact that whitespace

255

munging may have occurred.

256

257

One form of whitespace munging that has been observed is line-ending

258

conversion. Certain mail clients such as Evolution do not respect the

259

line-endings of text attachments. Since line-ending conversion is unlikely to

260

alter the meaning of a patch, it seems safe to ignore line endings when

261

comparing the preview patch.

262

263

Another form of whitespace munging that has been observed is

264

trailing-whitespace stripping. Again, it seems unlikely that stripping

265

trailing whitespace could alter the meaning of a patch. Such a distinction

266

is also invisible to readers, so ignoring it does not create a new threat. So

267

it seems reasonable to ignore trailing whitespace when comparing the patches.

268

269

Other mungings are possible, but it is recommended not to implement support

270

for them until they have been observed. Each of these changes makes the

271

comparison more approximate, and the more approximate it becomes, the easier it

272

is to provide a preview patch that does not match the requested changes.

Older »