6
by mbp at sourcefrog
import all docs from arch |
1 |
***************** |
2 |
Bazaar-NG formats |
|
3 |
***************** |
|
4 |
||
5 |
.. contents:: |
|
6 |
||
7 |
Since branches are working directories there is just a single |
|
8 |
directory format. |
|
9 |
||
10 |
There is one metadata directory called ``.bzr`` at the top of each |
|
11 |
tree. Control files inside ``.bzr`` are never touched by patches and |
|
12 |
should not normally be edited by the user. |
|
13 |
||
14 |
These files are designed so that repository-level operations are ACID |
|
15 |
without depending on atomic operations spanning multiple files. There |
|
16 |
are two particular cases: aborting a transaction in the middle, and |
|
17 |
contention from multiple processes. We also need to be careful to |
|
18 |
flush files to disk at appropriate points; even this may not be |
|
19 |
totally safe if the filesystem does not guarantee ordering between |
|
20 |
multiple file changes, so we need to be sure to roll back. |
|
21 |
||
22 |
The design must also be such that the directory can simply be copied |
|
23 |
and that hardlinked directories will work. (So we must always replace |
|
24 |
files, never just append.) |
|
25 |
||
26 |
A cache is kept under here of easily-accessible information about |
|
27 |
previous revisions. This should be under a single directory so that |
|
28 |
it can be easily identified, excluded from backups, removed, etc. |
|
29 |
This might contain pristine tree from previous revisions, manifests |
|
30 |
and inventories, etc. It might also contain working directories when |
|
31 |
building a commit, etc. Call this maybe ``cache`` or ``tmp``. |
|
32 |
||
33 |
I wonder if we should use .zip files for revisions and cacherevs |
|
34 |
rather than tar files so that random access is easier/more efficient. |
|
35 |
There is a Python library ``zipfile``. |
|
36 |
||
37 |
||
38 |
Signing XML files |
|
39 |
***************** |
|
40 |
||
41 |
bzr relies on storing hashes or GPG signatures of various XML files. |
|
42 |
There can be multiple equivalent representations of the same XML tree, |
|
43 |
but these will have different byte-by-byte hashes. |
|
44 |
||
45 |
Once signed files are written out, they must be stored byte-for-byte |
|
46 |
and never re-encoded or renormalized, because that would break their |
|
47 |
hash or signature. |
|
48 |
||
49 |
||
50 |
||
51 |
||
52 |
Branch metadata |
|
53 |
*************** |
|
54 |
||
55 |
All inside ``.bzr`` |
|
56 |
||
57 |
``README`` |
|
58 |
Tells people not to touch anything here. |
|
59 |
||
60 |
``branch-format`` |
|
61 |
Identifies the parent as a Bazaar-NG branch; contains the overall |
|
62 |
branch metadata format as a string. |
|
63 |
||
64 |
``pristine-directory`` |
|
65 |
Identifies that this is a pristine directory and may not be |
|
66 |
committed to. |
|
67 |
||
68 |
``patches/`` |
|
69 |
Directory containing all patches applied to this branch, one per |
|
70 |
file. Patches are stored as compressed deltas. We also store the |
|
71 |
hash of the delta, hash of the before and after manifests, and |
|
72 |
optionally a GPG signature. |
|
73 |
||
74 |
``cache/`` |
|
75 |
Contains various cached data that can be destroyed and will be |
|
76 |
recreated. (It should not be modified.) |
|
77 |
||
78 |
``cache/pristine/`` |
|
79 |
Contains cached full trees for selected previous revisions, used |
|
80 |
when generating diffs, etc. |
|
81 |
||
82 |
``cache/inventory/`` |
|
83 |
Contains cached inventories of previous revisions. |
|
84 |
||
85 |
``cache/snapshot/`` |
|
86 |
Contains tarballs of cached revisions of the tree, named by their |
|
87 |
revision id. These can also be removed, but |
|
88 |
||
89 |
``patch-history`` |
|
90 |
File containing the UUIDs of all patches taken in this branch, |
|
91 |
in the order they were taken. |
|
92 |
Each commit adds exactly one line to this file; lines are |
|
93 |
never removed or reordered. |
|
94 |
||
95 |
``merged-patches`` |
|
96 |
List of foreign patches that have been merged into this branch. |
|
97 |
Must have no entries in common with ``patch-history``. Commits that |
|
98 |
include merges add to this file; lines are never removed or |
|
99 |
reordered. |
|
100 |
||
101 |
``pending-merge-patches`` |
|
102 |
List of foreign patches that have been merged and are waiting to be |
|
103 |
committed. |
|
104 |
||
105 |
``branch-name`` |
|
106 |
User-qualified name of the branch, for the purpose of describing the |
|
107 |
origin of patches, e.g. ``mbp@sourcefrog.net/distcc--main``. |
|
108 |
||
109 |
``friends`` |
|
110 |
List of branches from which we have pulled; file containing a list |
|
111 |
of pairs of branch-name and location. |
|
112 |
||
113 |
``parent`` |
|
114 |
Default pull/push target. |
|
115 |
||
116 |
``pending-inventory`` |
|
117 |
Mapping from UUIDs to file name in the current working directory. |
|
118 |
||
119 |
``branch-lock`` |
|
120 |
Lock held while modifying the branch, to protect against clashing |
|
121 |
updates. |
|
122 |
||
123 |
||
124 |
Locking |
|
125 |
******* |
|
126 |
||
127 |
Is locking a good strategy? Perhaps somekind of read-copy-update or |
|
128 |
seq-lock based mechanism would work better? |
|
129 |
||
130 |
If we do use a locking algorithm, is it OK to rely on filesystem |
|
131 |
locking or do we need our own mechanism? I think most hosts should |
|
132 |
have reasonable ``flock()`` or equivalent, even on NFS. One risk is |
|
133 |
that on NFS it is easy to have broken locking and not know it, so it |
|
134 |
might be better to have something that will fail safe. |
|
135 |
||
136 |
Filesystem locks go away if the machine crashes or the process is |
|
137 |
terminated; this can be a feature in that we do not need to deal with |
|
138 |
stale locks but also a feature in that the lock itself does not |
|
139 |
indicate cleanup may be needed. |
|
140 |
||
141 |
robertc points out that tla converged on renaming a directory as a |
|
142 |
mechanism: this is one thing which is known to be atomic on almost all |
|
143 |
filesystems. Apparently renaming files, creating directories, making |
|
144 |
symlinks etc are not good enough. |
|
145 |
||
146 |
||
147 |
||
148 |
Delta |
|
149 |
***** |
|
150 |
||
151 |
XML document plus a bag of patches, expressing the difference between |
|
152 |
two revisions. May be a partial delta. |
|
153 |
||
154 |
* list of entries |
|
155 |
||
156 |
* entry |
|
157 |
||
158 |
* parent directory (if any) |
|
159 |
* before-name or null if new |
|
160 |
* after-name or null if deleted |
|
161 |
* uuid |
|
162 |
* type (dir, file, symlink, ...) |
|
163 |
* patch type (patch, full-text, xdelta, ...) |
|
164 |
* patch filename (?) |
|
165 |
||
166 |
||
167 |
Inventory |
|
168 |
********* |
|
169 |
||
170 |
XML document; series of entries. (Quite similar to the svn |
|
171 |
``entries`` file; perhaps should even have that name.) |
|
172 |
Stored identified by its hash. |
|
173 |
||
174 |
An inventory is stored for recorded revisions, also a |
|
175 |
``pending-inventory`` for a working directory. |
|
176 |
||
177 |
||
178 |
||
179 |
Revision |
|
180 |
******** |
|
181 |
||
182 |
XML document. Stored identified by its hash. |
|
183 |
||
184 |
committer |
|
185 |
RFC-2822-style name of the committer. Should match the key used to |
|
186 |
sign the revision. |
|
187 |
||
188 |
comment |
|
189 |
multi-line free-form text; whitespace and line breaks preserved |
|
190 |
||
191 |
timestamp |
|
192 |
As floating-point seconds since epoch. |
|
193 |
||
194 |
precursor |
|
195 |
ID of the previous revision on this branch. May be absent (null) if |
|
196 |
this is the start of a new branch. |
|
197 |
||
198 |
branch name |
|
199 |
Name of the branch to which this was originally committed. |
|
200 |
||
201 |
(I'm not totally satisfied that this is the right way to do it; the |
|
202 |
results will be a bit wierd when a series of revisions pass through |
|
203 |
variously named branches.) |
|
204 |
||
205 |
inventory_hash |
|
206 |
Acts as a pointer to the inventory for this revision. |
|
207 |
||
208 |
merged-branches |
|
209 |
Revision ids of complete branches merged into this revision. If a |
|
210 |
revision is listed, that revision and transitively its predecessor |
|
211 |
and all other merged-branches are merged. This is empty except |
|
212 |
where cherry-picks have occurred. |
|
213 |
||
214 |
merged-patches |
|
215 |
Revision ids of cherry-picked patches. Patches whose branches are |
|
216 |
merged need not be listed here. Listing a revision ID implies that |
|
217 |
only the change of that particular revision from its predecessor has |
|
218 |
been merged in. This is empty except where cherry-picks have |
|
219 |
occurred. |
|
220 |
||
221 |
The transitive closure avoids Arch's problem of needing to list a |
|
222 |
large number of previous revisions. As ddaa writes: |
|
223 |
||
224 |
Continuation revisions (created by tla tag or baz branch) are associated |
|
225 |
to a patchlog whose New-patches header lists the revisions associated to |
|
226 |
all the patchlogs present in the tree. That was introduced as an |
|
227 |
optimisation so the set of patchlogs in any revision could be determined |
|
228 |
solely by examining the patchlogs of ancestor revisions in the same |
|
229 |
branch. This behaves well as long as the total count of patchlog is |
|
230 |
reasonably small or new branches are not very frequent. |
|
231 |
||
232 |
A continuation revision on $tree currently creates a patchlog of |
|
233 |
about 500K. This patchlog is present in all descendent of the revision, |
|
234 |
and all revisions that merges it. |
|
235 |
||
236 |
It may be useful at some times to keep a cache of all the branches, or |
|
237 |
all the revisions, present in the history of a branch, so that we do |
|
238 |
need to walk the whole history of the branch to build this list. |
|
239 |
||
240 |
||
54
by mbp at sourcefrog
suggestions from robert about the inventory format |
241 |
---- |
242 |
||
243 |
Proposed changes |
|
244 |
**************** |
|
245 |
||
246 |
* Don't store parent-id in all revisions, but rather have <DIRECTORY> |
|
247 |
nodes that contain entries for children? |
|
248 |
||
249 |
* Assign an id to the root of the tree, perhaps listed in the top of |
|
250 |
the inventory? |