2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
1 |
========================== |
2592.3.229
by Martin Pool
Initial pack format documentation |
2 |
KnitPack repository format |
3 |
========================== |
|
4 |
||
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
5 |
.. contents:: |
6 |
||
7 |
Using KnitPack repositories |
|
8 |
=========================== |
|
9 |
||
2940.2.2
by Ian Clatworthy
review feedback from lifeless |
10 |
Motivation |
11 |
---------- |
|
12 |
||
13 |
KnitPack is a new repository format for Bazaar, which is expected to be |
|
14 |
faster both locally and over the network, is usually more compact, and |
|
15 |
will work with more FTP servers. |
|
16 |
||
17 |
Our benchmarking results to date have been very promising. We fully expect |
|
18 |
to make a pack-based format the default in the near future. We would |
|
19 |
therefore like as many people as possible using KnitPack repositories, |
|
20 |
benchmarking the results and telling us where improvements are still needed. |
|
21 |
||
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
22 |
Preparation |
23 |
----------- |
|
24 |
||
25 |
A small percentage of existing repositories may have some inconsistent |
|
26 |
data within them. It's is a good idea to check the integrity of your |
|
27 |
repositories before migrating them to knitpack format. To do this, run:: |
|
28 |
||
29 |
bzr check |
|
30 |
||
31 |
If that reports a problem, run this command:: |
|
32 |
||
33 |
bzr reconcile |
|
34 |
||
35 |
Note that this can take many hours for repositories with deep history |
|
36 |
so be sure to set aside some time for this if it is required. |
|
37 |
||
38 |
Creating a new knitpack branch |
|
39 |
------------------------------ |
|
40 |
||
41 |
If you're starting a project from scratch, it's easy to make it a |
|
42 |
``knitpack`` one. Here's how:: |
|
43 |
||
44 |
cd my-stuff |
|
45 |
bzr init --knitpack-experimental |
|
46 |
bzr add |
|
47 |
bzr commit -m "initial import" |
|
48 |
||
49 |
In other words, use the normal sequence of commands but add the |
|
50 |
``--knitpack-experimental`` option to the ``init`` command. |
|
51 |
||
52 |
Creating a new knitpack repository |
|
53 |
---------------------------------- |
|
54 |
||
55 |
If you're starting a project from scratch and wish to use a shared repository |
|
56 |
for branches, you can make it a ``knitpack`` repository like this:: |
|
57 |
||
58 |
cd my-repo |
|
59 |
bzr init-repo --knitpack-experimental . |
|
60 |
cd my-stuff |
|
61 |
bzr init |
|
62 |
bzr add |
|
63 |
bzr commit -m "initial import" |
|
64 |
||
65 |
In other words, use the normal sequence of commands but add the |
|
66 |
``--knitpack-experimental`` option to the ``init-repo`` command. |
|
67 |
||
68 |
Upgrading an existing branch or repository to knitpack format |
|
69 |
------------------------------------------------------------- |
|
70 |
||
2940.2.2
by Ian Clatworthy
review feedback from lifeless |
71 |
If you have an existing branch and wish to migrate it to |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
72 |
a ``knitpack`` format, use the ``upgrade`` command like this:: |
73 |
||
2940.2.3
by Ian Clatworthy
more feedback from lifeless |
74 |
bzr upgrade --knitpack-experimental path-to-my-branch |
75 |
||
76 |
If you are using a shared repository, run:: |
|
77 |
||
78 |
bzr upgrade --knitpack-experimental ROOT_OF_REPOSITORY |
|
79 |
||
80 |
to upgrade the history database. Note that this will not |
|
81 |
alter the branch format of each branch, so |
|
82 |
you will need to also upgrade each branch individually |
|
83 |
if you are upgrading from an old (e.g. < 0.17) bzr. |
|
84 |
More modern bzr's will already have the branch format at |
|
85 |
our latest branch format which adds support for tags. |
|
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
86 |
|
87 |
Starting a new knitpack branch from one in an older format |
|
88 |
---------------------------------------------------------- |
|
89 |
||
90 |
This can be done in one of several ways: |
|
91 |
||
92 |
1. Create a new branch and pull into it |
|
93 |
2. Create a standalone branch and upgrade its format |
|
94 |
3. Create a knitpack shared repository and branch into it |
|
95 |
||
96 |
Here are the commands for using the ``pull`` approach:: |
|
97 |
||
98 |
bzr init --knitpack-experimental my-new-branch |
|
99 |
cd my-new-branch |
|
100 |
bzr pull my-source-branch |
|
101 |
||
102 |
Here are the commands for using the ``upgrade`` approach:: |
|
103 |
||
104 |
bzr branch my-source-branch my-new-branch |
|
105 |
cd my-new-branch |
|
2940.2.2
by Ian Clatworthy
review feedback from lifeless |
106 |
bzr upgrade --knitpack-experimental . |
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
107 |
|
108 |
Here are the commands for the shared repository approach:: |
|
109 |
||
110 |
cd my-repo |
|
111 |
bzr init-repo --knitpack-experimental . |
|
112 |
bzr branch my-source-branch my-new-branch |
|
113 |
cd my-new-branch |
|
114 |
||
115 |
As a reminder, any of the above approaches can fail if the source branch |
|
116 |
has inconsistent data within it and hasn't been reconciled yet. Please |
|
117 |
be sure to check that before reporting problems. |
|
118 |
||
2940.2.3
by Ian Clatworthy
more feedback from lifeless |
119 |
Testing packs for bzr-svn users |
120 |
------------------------------- |
|
2940.2.1
by Ian Clatworthy
initial user doc for KnitPack repositories |
121 |
|
122 |
If you are using ``bzr-svn`` or are testing the prototype subtree support, |
|
123 |
you can still use and assist in testing KnitPacks. The commands to use |
|
124 |
are identical to the ones given above except that the name of the format |
|
125 |
to use is ``knitpack-subtree-experimental``. |
|
126 |
||
127 |
WARNING: Note that the subtree formats, ``distate-subtree`` and |
|
128 |
``knitpack-subtree-experimental``, are **not** production strength yet and |
|
129 |
may cause unexpected problems. They are required for the bzr-svn |
|
130 |
plug-in but should otherwise ony be used by people happy to live on the |
|
131 |
bleeding edge. If you are using bzr-svn, you're on the bleeding edge anyway. |
|
132 |
:-) |
|
133 |
||
134 |
Reporting problems |
|
135 |
------------------ |
|
136 |
||
137 |
If you need any help or encounter any problems, please contact the developers |
|
138 |
via the usual ways, i.e. chat to us on IRC or send a message to our mailing |
|
139 |
list. See http://bazaar-vcs.org/BzrSupport for contact details. |
|
140 |
||
141 |
||
142 |
Technical notes |
|
143 |
=============== |
|
144 |
||
2592.3.229
by Martin Pool
Initial pack format documentation |
145 |
Bazaar 0.92 adds a new format (experimental at first) implemented in |
146 |
``bzrlib.repofmt.pack_repo.py``. |
|
147 |
||
148 |
This format provides a knit-like interface which is quite compatible |
|
149 |
with knit format repositories: you can get a VersionedFile for a |
|
150 |
particular file-id, or for revisions, or for the inventory, even though |
|
151 |
these do not correspond to single files on disk. |
|
152 |
||
153 |
The on-disk format is that the repository directory contains these |
|
154 |
files and subdirectories: |
|
155 |
||
156 |
==================== ============================================= |
|
157 |
packs/ completed readonly packs |
|
158 |
indices/ indices for completed packs |
|
159 |
upload/ temporary files for packs currently being |
|
160 |
written |
|
161 |
obsolete_packs/ packs that have been repacked and are no |
|
162 |
longer normally needed |
|
163 |
pack-names index of all live packs |
|
164 |
lock/ lockdir |
|
165 |
==================== ============================================= |
|
166 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
167 |
Note that for consistency we always write "indices" not "indexes". |
168 |
||
2592.3.229
by Martin Pool
Initial pack format documentation |
169 |
This is implemented on top of pack files, which are written once from |
170 |
start to end, then left alone. A pack consists of a body file, plus |
|
171 |
several index files. There are four index files for each pack, which |
|
172 |
have the same basename and an extension indicating the purpose of the |
|
173 |
index: |
|
174 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
175 |
======== ========== ======================== ========================== |
176 |
extn Purpose Key References |
|
177 |
======== ========== ======================== ========================== |
|
178 |
``.tix`` File texts ``file_id, revision_id`` per-file parents, |
|
179 |
compression basis |
|
180 |
per-file parents |
|
181 |
``.six`` Signatures ``revision_id,`` - |
|
182 |
``.rix`` Revisions ``revision_id,`` revision parents |
|
183 |
``.iix`` Inventory ``revision_id,`` revision parents, |
|
184 |
compression base |
|
185 |
======== ========== ======================== ========================== |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
186 |
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
187 |
Indices are accessed through the ``bzrlib.index.GraphIndex`` class. |
2592.3.229
by Martin Pool
Initial pack format documentation |
188 |
Indices are stored as sorted files on disk. Each line is one record, |
189 |
and contains: |
|
190 |
||
191 |
* key fields |
|
192 |
* a value string - for all these indices, this is an ascii decimal pair |
|
193 |
of "offset length" giving the position of the refenced data within |
|
194 |
the pack body file |
|
195 |
* a list of zero or more reference lists |
|
196 |
||
197 |
The reference lists let a graph be stored within the index. Each |
|
198 |
reference list entry points to another entry in the same index. The |
|
199 |
references are represented as a byte offset for the target within the |
|
200 |
index file. |
|
201 |
||
202 |
When a compression base is given, it indicates that the body of the text |
|
203 |
or inventory is a forward delta from the referenced revision. The |
|
204 |
compression base list must have length 0 or 1. |
|
205 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
206 |
Like packs, indexes are written only once and then unmodified. A |
207 |
GraphIndex builder is a mutable in-memory graph that can be sorted, |
|
208 |
cross-referenced and written out when the write group completes. |
|
209 |
||
210 |
There can also be index entries with a value of 'a' for absent. These |
|
211 |
records exist just to be pointed to in a graph. This is used, for |
|
212 |
example, to give the revision-parent pointer when the parent revision is |
|
213 |
in a previous pack. |
|
214 |
||
2592.3.229
by Martin Pool
Initial pack format documentation |
215 |
The data content for each record is a knit data chunk. The knits are |
216 |
always unannotated - the annotations must be generated when needed. |
|
217 |
(We'd like to cache/memoize the annotations.) The data hunks can be |
|
218 |
moved between packs without needing to recompress them. |
|
219 |
||
220 |
It is not possible to regenerate an index from the body file, because it |
|
221 |
contains information stored in the knit index that's not in the body. |
|
222 |
(In particular, the per-file graph is only stored in the index.) |
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
223 |
We would like to change this in a future format. |
2592.3.229
by Martin Pool
Initial pack format documentation |
224 |
|
225 |
The lock is a regular LockDir lock. The lock is only held for a much |
|
226 |
reduced scope, while updating the pack-names file. The bulk of the |
|
227 |
insertion can be done without the repository locked. This is an |
|
228 |
implementation detail; the repository user should still call |
|
229 |
``repository.lock_write`` at the regular time but be aware this does not |
|
230 |
correspond to a physical mutex. |
|
231 |
||
232 |
Read locks control caching but do not affect writers. |
|
233 |
||
234 |
The newly-added repository write group concept is very important to |
|
235 |
KnitPack repositories. When ``start_write_group`` is called, a new |
|
236 |
temporary pack is created and all modifications to the repository will |
|
237 |
go into it until either ``commit_write_group`` or ``abort_write_group`` |
|
238 |
is called, at which time it is either finished and moved into place or |
|
239 |
discarded respectively. Write groups cannot be nested, only one can be |
|
240 |
underway at a time on a Repository instance and they must occur within a |
|
241 |
write lock. |
|
242 |
||
243 |
Normally the data for each revision will be entirely within a single |
|
244 |
pack but this is not required. |
|
245 |
||
246 |
When a pack is finished, it gets a final name based on the md5 of all |
|
247 |
the data written into the pack body file. |
|
248 |
||
249 |
The ``pack-names`` file gives the list of all finished non-obsolete |
|
250 |
packs. (This should always be the same as the list of files in the |
|
251 |
``packs/`` directory, but the file is needed for readonly http clients |
|
252 |
that can't easily list directories, and it includes other information.) |
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
253 |
The constraint on the ``pack-names`` list is that every file mentioned |
254 |
must exist in the ``packs/`` directory. |
|
255 |
||
256 |
In rare cases, when a writer is interrupted, about-to-be-removed packs |
|
257 |
may still be present in the directory but removed from the list. |
|
258 |
||
259 |
As well as the list of names, the pack-names file also contains the |
|
260 |
size, in bytes, of each of the four indices. This is used to bootstrap |
|
261 |
bisection search within the indices. |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
262 |
|
263 |
In normal use, one pack will be created for each commit to a repository. |
|
264 |
This would build up to an inefficient number of files over time, so a |
|
265 |
``repack`` operation is available to recombine them, by producing larger |
|
266 |
files containing data on multiple revisions. This can be done manually |
|
267 |
by running ``bzr pack``, and it also may happen automatically when a |
|
268 |
write group is committed. |
|
269 |
||
270 |
The repacking strategy used at the moment tries to balance not doing too |
|
271 |
much work during commit with not having too many small files left in the |
|
272 |
repository. The algorithm is roughly this: the total number of |
|
273 |
revisions in the repository is expressed as a decimal number, e.g. |
|
274 |
"532". Then we'll repack until we have five packs containing a hundred |
|
275 |
revisions each, three packs containing ten revisions each, and two packs |
|
276 |
with single revisions. This means that each revision will normally |
|
277 |
initially be created in a single-revision pack, then moved to a |
|
278 |
ten-revision pack, then to a 100-pack, and so on. |
|
279 |
||
2592.3.230
by Martin Pool
Review comments on knitpack docs |
280 |
As with other repositories, in normal use data is only inserted. |
281 |
However, in some circumstances we may want to garbage-collect or prune |
|
282 |
existing data, or reconcile indexes. |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
283 |
|
284 |
vim: tw=72 ft=rest expandtab |