~bzr-pqm/bzr/bzr.dev

2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
1
==========================
2592.3.229 by Martin Pool
Initial pack format documentation
2
KnitPack repository format
3
==========================
4
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
5
.. contents::
6
7
Using KnitPack repositories
8
===========================
9
2940.2.2 by Ian Clatworthy
review feedback from lifeless
10
Motivation
11
----------
12
13
KnitPack is a new repository format for Bazaar, which is expected to be
14
faster both locally and over the network, is usually more compact, and
15
will work with more FTP servers.
16
17
Our benchmarking results to date have been very promising. We fully expect
18
to make a pack-based format the default in the near future.  We would
19
therefore like as many people as possible using KnitPack repositories,
20
benchmarking the results and telling us where improvements are still needed.
21
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
22
Preparation
23
-----------
24
25
A small percentage of existing repositories may have some inconsistent
26
data within them. It's is a good idea to check the integrity of your
27
repositories before migrating them to knitpack format. To do this, run::
28
29
  bzr check
30
31
If that reports a problem, run this command::
32
33
  bzr reconcile
34
35
Note that this can take many hours for repositories with deep history
36
so be sure to set aside some time for this if it is required.
37
38
Creating a new knitpack branch
39
------------------------------
40
41
If you're starting a project from scratch, it's easy to make it a
42
``knitpack`` one. Here's how::
43
44
  cd my-stuff
45
  bzr init --knitpack-experimental
46
  bzr add
47
  bzr commit -m "initial import"
48
49
In other words, use the normal sequence of commands but add the
50
``--knitpack-experimental`` option to the ``init`` command.
51
52
Creating a new knitpack repository
53
----------------------------------
54
55
If you're starting a project from scratch and wish to use a shared repository
56
for branches, you can make it a ``knitpack`` repository like this::
57
58
  cd my-repo
59
  bzr init-repo --knitpack-experimental .
60
  cd my-stuff
61
  bzr init
62
  bzr add
63
  bzr commit -m "initial import"
64
65
In other words, use the normal sequence of commands but add the
66
``--knitpack-experimental`` option to the ``init-repo`` command.
67
68
Upgrading an existing branch or repository to knitpack format
69
-------------------------------------------------------------
70
2940.2.2 by Ian Clatworthy
review feedback from lifeless
71
If you have an existing branch and wish to migrate it to
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
72
a ``knitpack`` format, use the ``upgrade`` command like this::
73
2940.2.3 by Ian Clatworthy
more feedback from lifeless
74
  bzr upgrade --knitpack-experimental path-to-my-branch
75
76
If you are using a shared repository, run::
77
78
  bzr upgrade --knitpack-experimental ROOT_OF_REPOSITORY
79
80
to upgrade the history database. Note that this will not
81
alter the branch format of each branch, so
82
you will need to also upgrade each branch individually
83
if you are upgrading from an old (e.g. < 0.17) bzr.
84
More modern bzr's will already have the branch format at
85
our latest branch format which adds support for tags.
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
86
87
Starting a new knitpack branch from one in an older format
88
----------------------------------------------------------
89
90
This can be done in one of several ways:
91
92
1. Create a new branch and pull into it
93
2. Create a standalone branch and upgrade its format
94
3. Create a knitpack shared repository and branch into it
95
96
Here are the commands for using the ``pull`` approach::
97
98
    bzr init --knitpack-experimental my-new-branch
99
    cd my-new-branch
100
    bzr pull my-source-branch
101
102
Here are the commands for using the ``upgrade`` approach::
103
104
    bzr branch my-source-branch my-new-branch
105
    cd my-new-branch
2940.2.2 by Ian Clatworthy
review feedback from lifeless
106
    bzr upgrade --knitpack-experimental .
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
107
108
Here are the commands for the shared repository approach::
109
110
  cd my-repo
111
  bzr init-repo --knitpack-experimental .
112
  bzr branch my-source-branch my-new-branch
113
  cd my-new-branch
114
 
115
As a reminder, any of the above approaches can fail if the source branch
116
has inconsistent data within it and hasn't been reconciled yet. Please
117
be sure to check that before reporting problems.
118
2940.2.3 by Ian Clatworthy
more feedback from lifeless
119
Testing packs for bzr-svn users
120
-------------------------------
2940.2.1 by Ian Clatworthy
initial user doc for KnitPack repositories
121
122
If you are using ``bzr-svn`` or are testing the prototype subtree support,
123
you can still use and assist in testing KnitPacks. The commands to use
124
are identical to the ones given above except that the name of the format
125
to use is ``knitpack-subtree-experimental``.
126
127
WARNING: Note that the subtree formats, ``distate-subtree`` and
128
``knitpack-subtree-experimental``, are **not** production strength yet and
129
may cause unexpected problems. They are required for the bzr-svn
130
plug-in but should otherwise ony be used by people happy to live on the
131
bleeding edge. If you are using bzr-svn, you're on the bleeding edge anyway.
132
:-)
133
134
Reporting problems
135
------------------
136
137
If you need any help or encounter any problems, please contact the developers
138
via the usual ways, i.e. chat to us on IRC or send a message to our mailing
139
list. See http://bazaar-vcs.org/BzrSupport for contact details.
140
141
142
Technical notes
143
===============
144
2592.3.229 by Martin Pool
Initial pack format documentation
145
Bazaar 0.92 adds a new format (experimental at first) implemented in
146
``bzrlib.repofmt.pack_repo.py``.  
147
148
This format provides a knit-like interface which is quite compatible
149
with knit format repositories: you can get a VersionedFile for a
150
particular file-id, or for revisions, or for the inventory, even though
151
these do not correspond to single files on disk.
152
153
The on-disk format is that the repository directory contains these
154
files and subdirectories:
155
156
==================== =============================================
157
packs/               completed readonly packs
158
indices/             indices for completed packs
159
upload/              temporary files for packs currently being 
160
                     written
161
obsolete_packs/      packs that have been repacked and are no 
162
                     longer normally needed
163
pack-names           index of all live packs
164
lock/                lockdir
165
==================== =============================================
166
2592.3.230 by Martin Pool
Review comments on knitpack docs
167
Note that for consistency we always write "indices" not "indexes".
168
2592.3.229 by Martin Pool
Initial pack format documentation
169
This is implemented on top of pack files, which are written once from
170
start to end, then left alone.  A pack consists of a body file, plus
171
several index files.  There are four index files for each pack, which
172
have the same basename and an extension indicating the purpose of the
173
index:
174
2592.3.230 by Martin Pool
Review comments on knitpack docs
175
======== ========== ======================== ==========================
176
extn     Purpose    Key                      References
177
======== ========== ======================== ==========================
178
``.tix`` File texts ``file_id, revision_id`` per-file parents,
179
                                             compression basis
180
                                             per-file parents
181
``.six`` Signatures ``revision_id,``         -
182
``.rix`` Revisions  ``revision_id,``         revision parents
183
``.iix`` Inventory  ``revision_id,``         revision parents,
184
                                             compression base
185
======== ========== ======================== ==========================
2592.3.229 by Martin Pool
Initial pack format documentation
186
2592.3.230 by Martin Pool
Review comments on knitpack docs
187
Indices are accessed through the ``bzrlib.index.GraphIndex`` class.  
2592.3.229 by Martin Pool
Initial pack format documentation
188
Indices are stored as sorted files on disk.  Each line is one record,
189
and contains:
190
191
 * key fields
192
 * a value string - for all these indices, this is an ascii decimal pair
193
   of "offset length" giving the position of the refenced data within 
194
   the pack body file
195
 * a list of zero or more reference lists
196
197
The reference lists let a graph be stored within the index.  Each
198
reference list entry points to another entry in the same index.  The
199
references are represented as a byte offset for the target within the
200
index file.
201
202
When a compression base is given, it indicates that the body of the text
203
or inventory is a forward delta from the referenced revision.  The
204
compression base list must have length 0 or 1.
205
2592.3.230 by Martin Pool
Review comments on knitpack docs
206
Like packs, indexes are written only once and then unmodified.  A
207
GraphIndex builder is a mutable in-memory graph that can be sorted,
208
cross-referenced and written out when the write group completes.
209
210
There can also be index entries with a value of 'a' for absent.  These
211
records exist just to be pointed to in a graph.  This is used, for
212
example, to give the revision-parent pointer when the parent revision is
213
in a previous pack.
214
2592.3.229 by Martin Pool
Initial pack format documentation
215
The data content for each record is a knit data chunk.  The knits are
216
always unannotated - the annotations must be generated when needed.
217
(We'd like to cache/memoize the annotations.)  The data hunks can be
218
moved between packs without needing to recompress them.
219
220
It is not possible to regenerate an index from the body file, because it
221
contains information stored in the knit index that's not in the body.
222
(In particular, the per-file graph is only stored in the index.) 
2592.3.230 by Martin Pool
Review comments on knitpack docs
223
We would like to change this in a future format.
2592.3.229 by Martin Pool
Initial pack format documentation
224
225
The lock is a regular LockDir lock.  The lock is only held for a much
226
reduced scope, while updating the pack-names file.  The bulk of the
227
insertion can be done without the repository locked.  This is an
228
implementation detail; the repository user should still call
229
``repository.lock_write`` at the regular time but be aware this does not
230
correspond to a physical mutex. 
231
232
Read locks control caching but do not affect writers.
233
234
The newly-added repository write group concept is very important to
235
KnitPack repositories.  When ``start_write_group`` is called, a new
236
temporary pack is created and all modifications to the repository will 
237
go into it until either ``commit_write_group`` or ``abort_write_group``
238
is called, at which time it is either finished and moved into place or
239
discarded respectively.  Write groups cannot be nested, only one can be
240
underway at a time on a Repository instance and they must occur within a
241
write lock.
242
243
Normally the data for each revision will be entirely within a single
244
pack but this is not required.
245
246
When a pack is finished, it gets a final name based on the md5 of all
247
the data written into the pack body file.
248
249
The ``pack-names`` file gives the list of all finished non-obsolete
250
packs.  (This should always be the same as the list of files in the
251
``packs/`` directory, but the file is needed for readonly http clients
252
that can't easily list directories, and it includes other information.)
2592.3.230 by Martin Pool
Review comments on knitpack docs
253
The constraint on the ``pack-names`` list is that every file mentioned
254
must exist in the ``packs/`` directory.  
255
256
In rare cases, when a writer is interrupted, about-to-be-removed packs
257
may still be present in the directory but removed from the list.
258
259
As well as the list of names, the pack-names file also contains the
260
size, in bytes, of each of the four indices.  This is used to bootstrap
261
bisection search within the indices.
2592.3.229 by Martin Pool
Initial pack format documentation
262
263
In normal use, one pack will be created for each commit to a repository.
264
This would build up to an inefficient number of files over time, so a
265
``repack`` operation is available to recombine them, by producing larger
266
files containing data on multiple revisions.  This can be done manually
267
by running ``bzr pack``, and it also may happen automatically when a
268
write group is committed.
269
270
The repacking strategy used at the moment tries to balance not doing too
271
much work during commit with not having too many small files left in the
272
repository.  The algorithm is roughly this: the total number of
273
revisions in the repository is expressed as a decimal number, e.g.
274
"532".  Then we'll repack until we have five packs containing a hundred
275
revisions each, three packs containing ten revisions each, and two packs
276
with single revisions.  This means that each revision will normally
277
initially be created in a single-revision pack, then moved to a
278
ten-revision pack, then to a 100-pack, and so on.
279
2592.3.230 by Martin Pool
Review comments on knitpack docs
280
As with other repositories, in normal use data is only inserted.
281
However, in some circumstances we may want to garbage-collect or prune
282
existing data, or reconcile indexes.
2592.3.229 by Martin Pool
Initial pack format documentation
283
284
  vim: tw=72 ft=rest expandtab