2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
1 |
========================== |
2592.3.229
by Martin Pool
Initial pack format documentation |
2 |
KnitPack repository format |
3 |
========================== |
|
4 |
||
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
5 |
.. contents:: |
6 |
||
7 |
Using KnitPack repositories |
|
8 |
=========================== |
|
9 |
||
2940.2.2
by Ian Clatworthy
review feedback from lifeless |
10 |
Motivation |
11 |
---------- |
|
12 |
||
13 |
KnitPack is a new repository format for Bazaar, which is expected to be |
|
14 |
faster both locally and over the network, is usually more compact, and |
|
15 |
will work with more FTP servers. |
|
16 |
||
17 |
Our benchmarking results to date have been very promising. We fully expect |
|
18 |
to make a pack-based format the default in the near future. We would |
|
19 |
therefore like as many people as possible using KnitPack repositories, |
|
20 |
benchmarking the results and telling us where improvements are still needed. |
|
21 |
||
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
22 |
Preparation |
23 |
----------- |
|
24 |
||
25 |
A small percentage of existing repositories may have some inconsistent |
|
26 |
data within them. It's is a good idea to check the integrity of your |
|
27 |
repositories before migrating them to knitpack format. To do this, run:: |
|
28 |
||
29 |
bzr check |
|
30 |
||
31 |
If that reports a problem, run this command:: |
|
32 |
||
33 |
bzr reconcile |
|
34 |
||
35 |
Note that this can take many hours for repositories with deep history |
|
36 |
so be sure to set aside some time for this if it is required. |
|
37 |
||
38 |
Creating a new knitpack branch |
|
39 |
------------------------------ |
|
40 |
||
41 |
If you're starting a project from scratch, it's easy to make it a |
|
42 |
``knitpack`` one. Here's how:: |
|
43 |
||
44 |
cd my-stuff |
|
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
45 |
bzr init --pack-0.92 |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
46 |
bzr add |
47 |
bzr commit -m "initial import" |
|
48 |
||
49 |
In other words, use the normal sequence of commands but add the |
|
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
50 |
``--pack-0.92`` option to the ``init`` command. |
3010.3.1
by Martin Pool
Rename knitpack-experimental format to pack0.92 (not experimental) |
51 |
|
52 |
**Note:** In bzr 0.92, this format was called ``knitpack-experimental``. |
|
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
53 |
|
54 |
Creating a new knitpack repository |
|
55 |
---------------------------------- |
|
56 |
||
57 |
If you're starting a project from scratch and wish to use a shared repository |
|
58 |
for branches, you can make it a ``knitpack`` repository like this:: |
|
59 |
||
60 |
cd my-repo |
|
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
61 |
bzr init-repo --pack-0.92 . |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
62 |
cd my-stuff |
63 |
bzr init |
|
64 |
bzr add |
|
65 |
bzr commit -m "initial import" |
|
66 |
||
67 |
In other words, use the normal sequence of commands but add the |
|
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
68 |
``--pack-0.92`` option to the ``init-repo`` command. |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
69 |
|
70 |
Upgrading an existing branch or repository to knitpack format |
|
71 |
------------------------------------------------------------- |
|
72 |
||
2940.2.2
by Ian Clatworthy
review feedback from lifeless |
73 |
If you have an existing branch and wish to migrate it to |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
74 |
a ``knitpack`` format, use the ``upgrade`` command like this:: |
75 |
||
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
76 |
bzr upgrade --pack-0.92 path-to-my-branch |
2940.2.3
by Ian Clatworthy
more feedback from lifeless |
77 |
|
78 |
If you are using a shared repository, run:: |
|
79 |
||
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
80 |
bzr upgrade --pack-0.92 ROOT_OF_REPOSITORY |
2940.2.3
by Ian Clatworthy
more feedback from lifeless |
81 |
|
82 |
to upgrade the history database. Note that this will not |
|
83 |
alter the branch format of each branch, so |
|
84 |
you will need to also upgrade each branch individually |
|
85 |
if you are upgrading from an old (e.g. < 0.17) bzr. |
|
86 |
More modern bzr's will already have the branch format at |
|
87 |
our latest branch format which adds support for tags. |
|
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
88 |
|
89 |
Starting a new knitpack branch from one in an older format |
|
90 |
---------------------------------------------------------- |
|
91 |
||
92 |
This can be done in one of several ways: |
|
93 |
||
94 |
1. Create a new branch and pull into it |
|
95 |
2. Create a standalone branch and upgrade its format |
|
96 |
3. Create a knitpack shared repository and branch into it |
|
97 |
||
98 |
Here are the commands for using the ``pull`` approach:: |
|
99 |
||
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
100 |
bzr init --pack-0.92 my-new-branch |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
101 |
cd my-new-branch |
102 |
bzr pull my-source-branch |
|
103 |
||
104 |
Here are the commands for using the ``upgrade`` approach:: |
|
105 |
||
106 |
bzr branch my-source-branch my-new-branch |
|
107 |
cd my-new-branch |
|
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
108 |
bzr upgrade --pack-0.92 . |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
109 |
|
110 |
Here are the commands for the shared repository approach:: |
|
111 |
||
112 |
cd my-repo |
|
3010.3.2
by Martin Pool
Rename pack0.92 to pack-0.92 |
113 |
bzr init-repo --pack-0.92 . |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
114 |
bzr branch my-source-branch my-new-branch |
115 |
cd my-new-branch |
|
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
116 |
|
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
117 |
As a reminder, any of the above approaches can fail if the source branch |
118 |
has inconsistent data within it and hasn't been reconciled yet. Please |
|
119 |
be sure to check that before reporting problems. |
|
120 |
||
2940.2.3
by Ian Clatworthy
more feedback from lifeless |
121 |
Testing packs for bzr-svn users |
122 |
------------------------------- |
|
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
123 |
|
124 |
If you are using ``bzr-svn`` or are testing the prototype subtree support, |
|
125 |
you can still use and assist in testing KnitPacks. The commands to use |
|
126 |
are identical to the ones given above except that the name of the format |
|
127 |
to use is ``knitpack-subtree-experimental``. |
|
128 |
||
2955.4.1
by Matt Nordhoff
Fix a few typos in the knitpack.txt doc. |
129 |
WARNING: Note that the subtree formats, ``dirstate-subtree`` and |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
130 |
``knitpack-subtree-experimental``, are **not** production strength yet and |
131 |
may cause unexpected problems. They are required for the bzr-svn |
|
2955.4.1
by Matt Nordhoff
Fix a few typos in the knitpack.txt doc. |
132 |
plug-in but should otherwise only be used by people happy to live on the |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
133 |
bleeding edge. If you are using bzr-svn, you're on the bleeding edge anyway. |
134 |
:-) |
|
135 |
||
136 |
Reporting problems |
|
137 |
------------------ |
|
138 |
||
139 |
If you need any help or encounter any problems, please contact the developers |
|
140 |
via the usual ways, i.e. chat to us on IRC or send a message to our mailing |
|
5050.22.1
by John Arbash Meinel
Lots of documentation updates. |
141 |
list. See http://wiki.bazaar.canonical.com/BzrSupport for contact details. |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
142 |
|
143 |
||
144 |
Technical notes |
|
145 |
=============== |
|
146 |
||
2592.3.229
by Martin Pool
Initial pack format documentation |
147 |
Bazaar 0.92 adds a new format (experimental at first) implemented in |
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
148 |
``bzrlib.repofmt.pack_repo.py``. |
2592.3.229
by Martin Pool
Initial pack format documentation |
149 |
|
150 |
This format provides a knit-like interface which is quite compatible |
|
151 |
with knit format repositories: you can get a VersionedFile for a |
|
152 |
particular file-id, or for revisions, or for the inventory, even though |
|
153 |
these do not correspond to single files on disk. |
|
154 |
||
155 |
The on-disk format is that the repository directory contains these |
|
156 |
files and subdirectories: |
|
157 |
||
158 |
==================== ============================================= |
|
159 |
packs/ completed readonly packs |
|
160 |
indices/ indices for completed packs |
|
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
161 |
upload/ temporary files for packs currently being |
2592.3.229
by Martin Pool
Initial pack format documentation |
162 |
written |
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
163 |
obsolete_packs/ packs that have been repacked and are no |
2592.3.229
by Martin Pool
Initial pack format documentation |
164 |
longer normally needed |
165 |
pack-names index of all live packs |
|
166 |
lock/ lockdir |
|
167 |
==================== ============================================= |
|
168 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
169 |
Note that for consistency we always write "indices" not "indexes". |
170 |
||
2592.3.229
by Martin Pool
Initial pack format documentation |
171 |
This is implemented on top of pack files, which are written once from |
172 |
start to end, then left alone. A pack consists of a body file, plus |
|
173 |
several index files. There are four index files for each pack, which |
|
174 |
have the same basename and an extension indicating the purpose of the |
|
175 |
index: |
|
176 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
177 |
======== ========== ======================== ========================== |
178 |
extn Purpose Key References |
|
179 |
======== ========== ======================== ========================== |
|
180 |
``.tix`` File texts ``file_id, revision_id`` per-file parents, |
|
181 |
compression basis |
|
182 |
per-file parents |
|
183 |
``.six`` Signatures ``revision_id,`` - |
|
184 |
``.rix`` Revisions ``revision_id,`` revision parents |
|
185 |
``.iix`` Inventory ``revision_id,`` revision parents, |
|
186 |
compression base |
|
187 |
======== ========== ======================== ========================== |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
188 |
|
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
189 |
Indices are accessed through the ``bzrlib.index.GraphIndex`` class. |
2592.3.229
by Martin Pool
Initial pack format documentation |
190 |
Indices are stored as sorted files on disk. Each line is one record, |
191 |
and contains: |
|
192 |
||
193 |
* key fields |
|
194 |
* a value string - for all these indices, this is an ascii decimal pair |
|
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
195 |
of "offset length" giving the position of the referenced data within |
2592.3.229
by Martin Pool
Initial pack format documentation |
196 |
the pack body file |
197 |
* a list of zero or more reference lists |
|
198 |
||
199 |
The reference lists let a graph be stored within the index. Each |
|
200 |
reference list entry points to another entry in the same index. The |
|
201 |
references are represented as a byte offset for the target within the |
|
202 |
index file. |
|
203 |
||
204 |
When a compression base is given, it indicates that the body of the text |
|
205 |
or inventory is a forward delta from the referenced revision. The |
|
206 |
compression base list must have length 0 or 1. |
|
207 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
208 |
Like packs, indexes are written only once and then unmodified. A |
209 |
GraphIndex builder is a mutable in-memory graph that can be sorted, |
|
210 |
cross-referenced and written out when the write group completes. |
|
211 |
||
212 |
There can also be index entries with a value of 'a' for absent. These |
|
213 |
records exist just to be pointed to in a graph. This is used, for |
|
214 |
example, to give the revision-parent pointer when the parent revision is |
|
215 |
in a previous pack. |
|
216 |
||
2592.3.229
by Martin Pool
Initial pack format documentation |
217 |
The data content for each record is a knit data chunk. The knits are |
218 |
always unannotated - the annotations must be generated when needed. |
|
219 |
(We'd like to cache/memoize the annotations.) The data hunks can be |
|
220 |
moved between packs without needing to recompress them. |
|
221 |
||
222 |
It is not possible to regenerate an index from the body file, because it |
|
223 |
contains information stored in the knit index that's not in the body. |
|
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
224 |
(In particular, the per-file graph is only stored in the index.) |
2592.3.230
by Martin Pool
Review comments on knitpack docs |
225 |
We would like to change this in a future format. |
2592.3.229
by Martin Pool
Initial pack format documentation |
226 |
|
227 |
The lock is a regular LockDir lock. The lock is only held for a much |
|
228 |
reduced scope, while updating the pack-names file. The bulk of the |
|
229 |
insertion can be done without the repository locked. This is an |
|
230 |
implementation detail; the repository user should still call |
|
231 |
``repository.lock_write`` at the regular time but be aware this does not |
|
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
232 |
correspond to a physical mutex. |
2592.3.229
by Martin Pool
Initial pack format documentation |
233 |
|
234 |
Read locks control caching but do not affect writers. |
|
235 |
||
236 |
The newly-added repository write group concept is very important to |
|
237 |
KnitPack repositories. When ``start_write_group`` is called, a new |
|
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
238 |
temporary pack is created and all modifications to the repository will |
2592.3.229
by Martin Pool
Initial pack format documentation |
239 |
go into it until either ``commit_write_group`` or ``abort_write_group`` |
240 |
is called, at which time it is either finished and moved into place or |
|
241 |
discarded respectively. Write groups cannot be nested, only one can be |
|
242 |
underway at a time on a Repository instance and they must occur within a |
|
243 |
write lock. |
|
244 |
||
245 |
Normally the data for each revision will be entirely within a single |
|
246 |
pack but this is not required. |
|
247 |
||
248 |
When a pack is finished, it gets a final name based on the md5 of all |
|
249 |
the data written into the pack body file. |
|
250 |
||
251 |
The ``pack-names`` file gives the list of all finished non-obsolete |
|
252 |
packs. (This should always be the same as the list of files in the |
|
253 |
``packs/`` directory, but the file is needed for readonly http clients |
|
254 |
that can't easily list directories, and it includes other information.) |
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
255 |
The constraint on the ``pack-names`` list is that every file mentioned |
2955.4.2
by Matt Nordhoff
Remove excess whitespace from the knitpack.txt doc. |
256 |
must exist in the ``packs/`` directory. |
2592.3.230
by Martin Pool
Review comments on knitpack docs |
257 |
|
258 |
In rare cases, when a writer is interrupted, about-to-be-removed packs |
|
259 |
may still be present in the directory but removed from the list. |
|
260 |
||
261 |
As well as the list of names, the pack-names file also contains the |
|
262 |
size, in bytes, of each of the four indices. This is used to bootstrap |
|
263 |
bisection search within the indices. |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
264 |
|
265 |
In normal use, one pack will be created for each commit to a repository. |
|
266 |
This would build up to an inefficient number of files over time, so a |
|
267 |
``repack`` operation is available to recombine them, by producing larger |
|
268 |
files containing data on multiple revisions. This can be done manually |
|
269 |
by running ``bzr pack``, and it also may happen automatically when a |
|
270 |
write group is committed. |
|
271 |
||
272 |
The repacking strategy used at the moment tries to balance not doing too |
|
273 |
much work during commit with not having too many small files left in the |
|
274 |
repository. The algorithm is roughly this: the total number of |
|
275 |
revisions in the repository is expressed as a decimal number, e.g. |
|
276 |
"532". Then we'll repack until we have five packs containing a hundred |
|
277 |
revisions each, three packs containing ten revisions each, and two packs |
|
278 |
with single revisions. This means that each revision will normally |
|
279 |
initially be created in a single-revision pack, then moved to a |
|
280 |
ten-revision pack, then to a 100-pack, and so on. |
|
281 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
282 |
As with other repositories, in normal use data is only inserted. |
283 |
However, in some circumstances we may want to garbage-collect or prune |
|
284 |
existing data, or reconcile indexes. |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
285 |
|
3052.6.2
by John Arbash Meinel
Clean up some vim: lines to make them proper ReST comments. |
286 |
.. |
287 |
vim: tw=72 ft=rst expandtab |