~bzr-pqm/bzr/bzr.dev

2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
1
==========================
2592.3.229 by Martin Pool
Initial pack format documentation
2
KnitPack repository format
3
==========================
4
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
5
.. contents::
6
7
Using KnitPack repositories
8
===========================
9
2940.2.2 by Ian Clatworthy
review feedback from lifeless
10
Motivation
11
----------
12
13
KnitPack is a new repository format for Bazaar, which is expected to be
14
faster both locally and over the network, is usually more compact, and
15
will work with more FTP servers.
16
17
Our benchmarking results to date have been very promising. We fully expect
18
to make a pack-based format the default in the near future.  We would
19
therefore like as many people as possible using KnitPack repositories,
20
benchmarking the results and telling us where improvements are still needed.
21
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
22
Preparation
23
-----------
24
25
A small percentage of existing repositories may have some inconsistent
26
data within them. It's is a good idea to check the integrity of your
27
repositories before migrating them to knitpack format. To do this, run::
28
29
  bzr check
30
31
If that reports a problem, run this command::
32
33
  bzr reconcile
34
35
Note that this can take many hours for repositories with deep history
36
so be sure to set aside some time for this if it is required.
37
38
Creating a new knitpack branch
39
------------------------------
40
41
If you're starting a project from scratch, it's easy to make it a
42
``knitpack`` one. Here's how::
43
44
  cd my-stuff
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
45
  bzr init --pack-0.92
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
46
  bzr add
47
  bzr commit -m "initial import"
48
49
In other words, use the normal sequence of commands but add the
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
50
``--pack-0.92`` option to the ``init`` command.
3010.3.1 by Martin Pool
Rename knitpack-experimental format to pack0.92 (not experimental)
51
52
**Note:** In bzr 0.92, this format was called ``knitpack-experimental``.
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
53
54
Creating a new knitpack repository
55
----------------------------------
56
57
If you're starting a project from scratch and wish to use a shared repository
58
for branches, you can make it a ``knitpack`` repository like this::
59
60
  cd my-repo
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
61
  bzr init-repo --pack-0.92 .
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
62
  cd my-stuff
63
  bzr init
64
  bzr add
65
  bzr commit -m "initial import"
66
67
In other words, use the normal sequence of commands but add the
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
68
``--pack-0.92`` option to the ``init-repo`` command.
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
69
70
Upgrading an existing branch or repository to knitpack format
71
-------------------------------------------------------------
72
2940.2.2 by Ian Clatworthy
review feedback from lifeless
73
If you have an existing branch and wish to migrate it to
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
74
a ``knitpack`` format, use the ``upgrade`` command like this::
75
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
76
  bzr upgrade --pack-0.92 path-to-my-branch
2940.2.3 by Ian Clatworthy
more feedback from lifeless
77
78
If you are using a shared repository, run::
79
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
80
  bzr upgrade --pack-0.92 ROOT_OF_REPOSITORY
2940.2.3 by Ian Clatworthy
more feedback from lifeless
81
82
to upgrade the history database. Note that this will not
83
alter the branch format of each branch, so
84
you will need to also upgrade each branch individually
85
if you are upgrading from an old (e.g. < 0.17) bzr.
86
More modern bzr's will already have the branch format at
87
our latest branch format which adds support for tags.
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
88
89
Starting a new knitpack branch from one in an older format
90
----------------------------------------------------------
91
92
This can be done in one of several ways:
93
94
1. Create a new branch and pull into it
95
2. Create a standalone branch and upgrade its format
96
3. Create a knitpack shared repository and branch into it
97
98
Here are the commands for using the ``pull`` approach::
99
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
100
    bzr init --pack-0.92 my-new-branch
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
101
    cd my-new-branch
102
    bzr pull my-source-branch
103
104
Here are the commands for using the ``upgrade`` approach::
105
106
    bzr branch my-source-branch my-new-branch
107
    cd my-new-branch
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
108
    bzr upgrade --pack-0.92 .
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
109
110
Here are the commands for the shared repository approach::
111
112
  cd my-repo
3010.3.2 by Martin Pool
Rename pack0.92 to pack-0.92
113
  bzr init-repo --pack-0.92 .
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
114
  bzr branch my-source-branch my-new-branch
115
  cd my-new-branch
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
116
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
117
As a reminder, any of the above approaches can fail if the source branch
118
has inconsistent data within it and hasn't been reconciled yet. Please
119
be sure to check that before reporting problems.
120
2940.2.3 by Ian Clatworthy
more feedback from lifeless
121
Testing packs for bzr-svn users
122
-------------------------------
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
123
124
If you are using ``bzr-svn`` or are testing the prototype subtree support,
125
you can still use and assist in testing KnitPacks. The commands to use
126
are identical to the ones given above except that the name of the format
127
to use is ``knitpack-subtree-experimental``.
128
2955.4.1 by Matt Nordhoff
Fix a few typos in the knitpack.txt doc.
129
WARNING: Note that the subtree formats, ``dirstate-subtree`` and
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
130
``knitpack-subtree-experimental``, are **not** production strength yet and
131
may cause unexpected problems. They are required for the bzr-svn
2955.4.1 by Matt Nordhoff
Fix a few typos in the knitpack.txt doc.
132
plug-in but should otherwise only be used by people happy to live on the
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
133
bleeding edge. If you are using bzr-svn, you're on the bleeding edge anyway.
134
:-)
135
136
Reporting problems
137
------------------
138
139
If you need any help or encounter any problems, please contact the developers
140
via the usual ways, i.e. chat to us on IRC or send a message to our mailing
5050.22.1 by John Arbash Meinel
Lots of documentation updates.
141
list. See http://wiki.bazaar.canonical.com/BzrSupport for contact details.
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
142
143
144
Technical notes
145
===============
146
2592.3.229 by Martin Pool
Initial pack format documentation
147
Bazaar 0.92 adds a new format (experimental at first) implemented in
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
148
``bzrlib.repofmt.pack_repo.py``.
2592.3.229 by Martin Pool
Initial pack format documentation
149
150
This format provides a knit-like interface which is quite compatible
151
with knit format repositories: you can get a VersionedFile for a
152
particular file-id, or for revisions, or for the inventory, even though
153
these do not correspond to single files on disk.
154
155
The on-disk format is that the repository directory contains these
156
files and subdirectories:
157
158
==================== =============================================
159
packs/               completed readonly packs
160
indices/             indices for completed packs
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
161
upload/              temporary files for packs currently being
2592.3.229 by Martin Pool
Initial pack format documentation
162
                     written
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
163
obsolete_packs/      packs that have been repacked and are no
2592.3.229 by Martin Pool
Initial pack format documentation
164
                     longer normally needed
165
pack-names           index of all live packs
166
lock/                lockdir
167
==================== =============================================
168
2592.3.230 by Martin Pool
Review comments on knitpack docs
169
Note that for consistency we always write "indices" not "indexes".
170
2592.3.229 by Martin Pool
Initial pack format documentation
171
This is implemented on top of pack files, which are written once from
172
start to end, then left alone.  A pack consists of a body file, plus
173
several index files.  There are four index files for each pack, which
174
have the same basename and an extension indicating the purpose of the
175
index:
176
2592.3.230 by Martin Pool
Review comments on knitpack docs
177
======== ========== ======================== ==========================
178
extn     Purpose    Key                      References
179
======== ========== ======================== ==========================
180
``.tix`` File texts ``file_id, revision_id`` per-file parents,
181
                                             compression basis
182
                                             per-file parents
183
``.six`` Signatures ``revision_id,``         -
184
``.rix`` Revisions  ``revision_id,``         revision parents
185
``.iix`` Inventory  ``revision_id,``         revision parents,
186
                                             compression base
187
======== ========== ======================== ==========================
2592.3.229 by Martin Pool
Initial pack format documentation
188
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
189
Indices are accessed through the ``bzrlib.index.GraphIndex`` class.
2592.3.229 by Martin Pool
Initial pack format documentation
190
Indices are stored as sorted files on disk.  Each line is one record,
191
and contains:
192
193
 * key fields
194
 * a value string - for all these indices, this is an ascii decimal pair
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
195
   of "offset length" giving the position of the referenced data within
2592.3.229 by Martin Pool
Initial pack format documentation
196
   the pack body file
197
 * a list of zero or more reference lists
198
199
The reference lists let a graph be stored within the index.  Each
200
reference list entry points to another entry in the same index.  The
201
references are represented as a byte offset for the target within the
202
index file.
203
204
When a compression base is given, it indicates that the body of the text
205
or inventory is a forward delta from the referenced revision.  The
206
compression base list must have length 0 or 1.
207
2592.3.230 by Martin Pool
Review comments on knitpack docs
208
Like packs, indexes are written only once and then unmodified.  A
209
GraphIndex builder is a mutable in-memory graph that can be sorted,
210
cross-referenced and written out when the write group completes.
211
212
There can also be index entries with a value of 'a' for absent.  These
213
records exist just to be pointed to in a graph.  This is used, for
214
example, to give the revision-parent pointer when the parent revision is
215
in a previous pack.
216
2592.3.229 by Martin Pool
Initial pack format documentation
217
The data content for each record is a knit data chunk.  The knits are
218
always unannotated - the annotations must be generated when needed.
219
(We'd like to cache/memoize the annotations.)  The data hunks can be
220
moved between packs without needing to recompress them.
221
222
It is not possible to regenerate an index from the body file, because it
223
contains information stored in the knit index that's not in the body.
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
224
(In particular, the per-file graph is only stored in the index.)
2592.3.230 by Martin Pool
Review comments on knitpack docs
225
We would like to change this in a future format.
2592.3.229 by Martin Pool
Initial pack format documentation
226
227
The lock is a regular LockDir lock.  The lock is only held for a much
228
reduced scope, while updating the pack-names file.  The bulk of the
229
insertion can be done without the repository locked.  This is an
230
implementation detail; the repository user should still call
231
``repository.lock_write`` at the regular time but be aware this does not
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
232
correspond to a physical mutex.
2592.3.229 by Martin Pool
Initial pack format documentation
233
234
Read locks control caching but do not affect writers.
235
236
The newly-added repository write group concept is very important to
237
KnitPack repositories.  When ``start_write_group`` is called, a new
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
238
temporary pack is created and all modifications to the repository will
2592.3.229 by Martin Pool
Initial pack format documentation
239
go into it until either ``commit_write_group`` or ``abort_write_group``
240
is called, at which time it is either finished and moved into place or
241
discarded respectively.  Write groups cannot be nested, only one can be
242
underway at a time on a Repository instance and they must occur within a
243
write lock.
244
245
Normally the data for each revision will be entirely within a single
246
pack but this is not required.
247
248
When a pack is finished, it gets a final name based on the md5 of all
249
the data written into the pack body file.
250
251
The ``pack-names`` file gives the list of all finished non-obsolete
252
packs.  (This should always be the same as the list of files in the
5538.2.1 by Zearin
Fixed capitalization of XML and HTTP. Fixed by hand and only where appropriate (e.g., left http://some/url lowercase, but capitalized "When making an HTTP request…").
253
``packs/`` directory, but the file is needed for read-only HTTP clients
2592.3.229 by Martin Pool
Initial pack format documentation
254
that can't easily list directories, and it includes other information.)
2592.3.230 by Martin Pool
Review comments on knitpack docs
255
The constraint on the ``pack-names`` list is that every file mentioned
2955.4.2 by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc.
256
must exist in the ``packs/`` directory.
2592.3.230 by Martin Pool
Review comments on knitpack docs
257
258
In rare cases, when a writer is interrupted, about-to-be-removed packs
259
may still be present in the directory but removed from the list.
260
261
As well as the list of names, the pack-names file also contains the
262
size, in bytes, of each of the four indices.  This is used to bootstrap
263
bisection search within the indices.
2592.3.229 by Martin Pool
Initial pack format documentation
264
265
In normal use, one pack will be created for each commit to a repository.
266
This would build up to an inefficient number of files over time, so a
267
``repack`` operation is available to recombine them, by producing larger
268
files containing data on multiple revisions.  This can be done manually
269
by running ``bzr pack``, and it also may happen automatically when a
270
write group is committed.
271
272
The repacking strategy used at the moment tries to balance not doing too
273
much work during commit with not having too many small files left in the
274
repository.  The algorithm is roughly this: the total number of
275
revisions in the repository is expressed as a decimal number, e.g.
276
"532".  Then we'll repack until we have five packs containing a hundred
277
revisions each, three packs containing ten revisions each, and two packs
278
with single revisions.  This means that each revision will normally
279
initially be created in a single-revision pack, then moved to a
280
ten-revision pack, then to a 100-pack, and so on.
281
2592.3.230 by Martin Pool
Review comments on knitpack docs
282
As with other repositories, in normal use data is only inserted.
283
However, in some circumstances we may want to garbage-collect or prune
284
existing data, or reconcile indexes.
2592.3.229 by Martin Pool
Initial pack format documentation
285
3052.6.2 by John Arbash Meinel
Clean up some vim: lines to make them proper ReST comments.
286
..
287
   vim: tw=72 ft=rst expandtab