~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/developers/bundle-format4.txt

Committer: Aaron Bentley
Date: 2007-06-22 18:46:43 UTC
mto: (2520.5.2 bzr.mpbundle)
mto: This revision was merged to the branch mainline in revision 2631.
Revision ID: abentley@panoramicfeedback.com-20070622184643-hh3f73087w00cg1l

Describe bundles and merge directives

files modified:
doc/developers/bundle-format4.txt

Show diffs side-by-side

added added

removed removed

doc/developers/bundle-format4.txt

===============

Bundle format 4

===============

============================================

Merge Directive format 2 and Bundle format 4

============================================

:Date: 2007-06-21

Motivation

----------

Format 4 is designed to be a compact format for storing revision metadata that

can be generated quickly and installed into a repository efficiently. It is

not intended to be human-readable; that responsibility has been given to merge

directives.

Format Name

-----------

This is the fourth format to see public use. Previous versions were 0.7, 0.8,

and 0.9. Only 0.7's version number was aligned with a Bazaar release.

Merge Directive format 2 represents a request to perform a certain merge. It

provides access to all the data necessary to perform that merge, by including

a branch URL or a bundle payload. It typically will include a preview of

what applying the patch would do.

Bundle Format 4 is designed to be a compact format for storing revision

metadata that can be generated quickly and installed into a repository

efficiently. It is not intended to be human-readable.

Note

----

These two formats, taken together, can be viewed as the successor of Bundle

format 0.9, so their specifications are combined. It is expected that in the

future, bundle and merge-directive formats will vary independently.

Bundle Format Name

------------------

This is the fourth bundle format to see public use. Previous versions were

0.7, 0.8, and 0.9. Only 0.7's version number was aligned with a Bazaar

release.

Dependencies

- Container format 1

- Multiparent diffs

- Bencode

- Patch-RIO

Description

-----------

This format was designed to trade human-readability for speed and compactness.

It does not contain a human-readable "prelude" patch.

Relationship to merge directives

--------------------------------

A merge directive specifies a merge command to apply and a preview of what that

command would do. Merge directives may contain a format-4 bundle. The

bundle's job is to provide the data needed to perform that merge command.

It is recommended that the bundle be provided in a bzip-compressed,

mime64-encoded format, to ensure compactness and resistance to email-transport

damage.

A preview/overview patch may be provided by the merge directive.

Merge Directives fulfil the role previous bundle formats had of requesting a

merge to be performed, but are a more flexible way of doing so. With the

introduction of these two formats, there is a clear split between "directive",

which is a request to merge (and therefore signable), and "bundle", which is

just data.

Merge Directive format 2 may provide a patch preview of the change being

requested. If a preview is supplied, the receiving client will verify that

the actual change matches the preview.

Merge Directive format 2 also includes a testament hash, to ensure that if a

branch is used, the branch cannot be subverted to cause the wrong changes to be

applied.

Bundle format 4 is designed to trade human-readability for speed and

compactness. It does not contain a human-readable "prelude" patch.

Merge Directive 2 Contents

--------------------------

This format consists of three sections, in the following order.

Patch-RIO command section

~~~~~~~~~~~~~~~~~~~~~~~~~

This section is identical to the corresponding section in Format 1 merge

directives, except as noted below. It is mandatory. It is terminated by a

line reading ``#`` that is not preceeded by a line ending with ``\``.

This format adds a new piece of information, the ``base_revision_id``. This is

a suggested base revision for merging. It may be supplied by the user. If

not, it is calculated using the standard merge base algorithm, with the

``revision_id`` and target branch's ``last_revision`` as its inputs.

When merging, clients should use the ``base_revision_id`` when it is not

already present in the ancestry of the ``last_revision`` of the target branch.

If it is already present, clients should calculate a merge base in the normal

way.

Patch preview section

~~~~~~~~~~~~~~~~~~~~~

This section is optional. It begins with the line ``# Begin patch``. It is

terminated by the end-of-file or by the beginning of a bundle section.

Its contents are a unified diff, as per the ``bzr diff`` command. The FROM

revision is the ``base_revision_id`` specified in the Patch-RIO section.

Bundle section

~~~~~~~~~~~~~~

This section is optional, but if it is not supplied, a source_branch must be

supplied. It begins with the line ``# Begin bundle``, and is terminated by the

end-of-file.

The contents are a base-64 encoded bundle. This may be any bundle format, but

formats 4+ are strongly recommended. The base revision is the newest revision

in the source branch which is an ancestor of all revisions not present in

target which are ancestors of revision_id.

100

101

This base revision may or may not be the same as the ``base_revision_id``. In

102

particular, the ``base_revision_id`` may specify a cherry-pick, but all the

103

ancestors of the ``base_revision_id`` should be installed in the target

104

repository before performing such a merge.

105

106

107

Bundle 4 Contents

108

-----------------

109

Bazaar revision bundles begin with a format marker that reads

110

``# Bazaar revision bundle v4`` in plaintext. The remainder of the file is a

111

``Bazaar pack format 1`` container. The container is compressed using bzip2.

112

113

Putting the format marker in plaintext ensures that old clients will give good

114

diagnostics, but renders the file unreadable by standard bzip2 utilities.

115

``bzr bundle-info -v`` can be used to dump the unencoded output.

116

117

Serialization

-------------

118

~~~~~~~~~~~~~

119

Format 4 records revision and inventory records in their repository

120

serialization format. This minimizes translation and compression costs

121

in the common case, where the sender and receiver use the same serialization

122

format for their repository. Steps have been taken to ensure a faithful

123

conversion when serialization formats are mismatched.

124

125

126

Bundle Records

127

~~~~~~~~~~~~~~

128

The bundle format creates a single bundle-level record out of two container

129

records. The first container record contains metainfo as a Bencoded dict. The

130

second container record contains the body.

131

132

The bundle record name is associated with the metainfo record. The body record

133

is anonymous.

134

135

136

Record metainfo

---------------

The bundle format creates a single meta-record out of two. The first record

contains metainfo as a Bencoded dict. The second record contains the body.

137

~~~~~~~~~~~~~~~

138

:record_kind: The storage strategy of the record. May be "fulltext" (the

record body contains the full text of the value), "mpdiff" (the record body

contains a multi-parent diff of the value), or "header" (no record body).

139

:record_kind: The storage strategy of the record. May be ``fulltext`` (the

140

record body contains the full text of the value), ``mpdiff`` (the record

141

body contains a multi-parent diff of the value), or ``header`` (no record

142

body).

143

:parents: Used in fulltext and mpdiff records. The revisions that should be

144

noted as parents of this revision in the repository. For mpdiffs, this is

145

also the list of build-parents.

146

:sha1: Used in mpdiff records. The sha-1 hash of the full-text value.

147

148

149

Bundle record naming

---------------------

All bundle records have a single name, which is associated with the metainfo.

(The body records are anonymous). Records are named according to the body's

content-kind, revision-id, and file-id.

150

~~~~~~~~~~~~~~~~~~~~~

151

All bundle records have a single name, which is associated with the metainfo

152

container record. Records are named according to the body's content-kind,

153

revision-id, and file-id.

154

155

Content-kind may be one of:

156

158

:inventory: the tree inventory

159

:revision: the revision metadata for a revision

160

:signature: the revision signature for a revision

:testament: a testament for a revision

161

Names are constructed like so: "content-kind/revision-id/file-id".

162

Names are constructed like so: ``content-kind/revision-id/file-id``.

163

A record has a file-id if-and-only-if it is a file record.

164

165

166

Layout

------

167

~~~~~~

168

The first record is an info/header record.

169

170

The subsequent records are mpdiff file records. The are ordered first by file

175

The next records are revision and signature fulltexts. They are interleaved

176

and topologically sorted.

177

178

Info record

179

~~~~~~~~~~~

180

The info record has type ``header``. It has no revision_id or file_id.

181

Its metadata contains:

182

183

:serializer: A string describing the serialization format used for inventory

184

and revision data. May be ``xml5``, ``xml6`` or ``xml7``.

185

:supports_rich_root: 1 if the source repository supports rich roots,

186

0 otherwise.

187

:supports_tree_references: 1 if the source repository supports subtree

188

references, 0 otherwise.

189

190

191

Implementation notes

--------------------

192

~~~~~~~~~~~~~~~~~~~~

193

- knit deltas contain almost enough information to extract the original

194

SequenceMatcher.get_matching_blocks() call used to produce them. Combining

100

195

that information with the relevant fulltexts allows us to avoid performing

101

196

sequence matching on any fulltexts for which we have deltas.

102

197

103

- MultiParent deltas contain get_matching_blocks output almost verbatim, but

104

if there is more than one parent, the information about the leftmost parent

105

may be incomplete. However, for single-parent multiparent diffs, we can

106

extract the SequenceMatcher.get_matching_blocks output, and therefore

107

the SequenceMatcher.get_opcodes output used to create knit deltas.

198

- MultiParent deltas contain ``get_matching_blocks`` output almost verbatim,

199

but if there is more than one parent, the information about the leftmost

200

parent may be incomplete. However, for single-parent multiparent diffs, we

201

can extract the ``SequenceMatcher.get_matching_blocks`` output, and therefore

202

``the SequenceMatcher.get_opcodes`` output used to create knit deltas.

203

108

204

109

205

Installing data across serialization mismatches

110

206

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

124

220

125

221

This is much slower, of course. But since the since the fulltext is verified

126

222

at step 5, it should be just as safe as any other conversion.

223

224

Model differences

225

~~~~~~~~~~~~~~~~~

226

227

Note that there may be model differences requiring additional changes. These

228

differences are described by the "supports_rich_root" and

229

"supports_tree_references" values in the info record.

230

231

A subset of xml6 and xml7 records are compatible with xml5 (i.e. those that

232

were converted from xml5 originally).

233

234

When installing from a supports_tree_references bundle to a repository that

235

does not support tree references, clients should halt if they encounter a

236

record containing a tree reference.

237

238

When installing from a supports_rich_root bundle to a repository that does not

239

support rich roots, clients should halt if they encounter an inventory record

240

whose root directory revision-id does not match the inventory revision id.

241

242

When installing from a bundle that does not support rich roots to a repository

243

that does, additional knits should be added for the root directory, with a

244

revision for each inventory revision.

Older »