7
Since branches are working directories there is just a single
10
There is one metadata directory called ``.bzr`` at the top of each
11
tree. Control files inside ``.bzr`` are never touched by patches and
12
should not normally be edited by the user.
14
These files are designed so that repository-level operations are ACID
15
without depending on atomic operations spanning multiple files. There
16
are two particular cases: aborting a transaction in the middle, and
17
contention from multiple processes. We also need to be careful to
18
flush files to disk at appropriate points; even this may not be
19
totally safe if the filesystem does not guarantee ordering between
20
multiple file changes, so we need to be sure to roll back.
22
The design must also be such that the directory can simply be copied
23
and that hardlinked directories will work. (So we must always replace
24
files, never just append.)
26
A cache is kept under here of easily-accessible information about
27
previous revisions. This should be under a single directory so that
28
it can be easily identified, excluded from backups, removed, etc.
29
This might contain pristine tree from previous revisions, manifests
30
and inventories, etc. It might also contain working directories when
31
building a commit, etc. Call this maybe ``cache`` or ``tmp``.
33
I wonder if we should use .zip files for revisions and cacherevs
34
rather than tar files so that random access is easier/more efficient.
35
There is a Python library ``zipfile``.
41
bzr relies on storing hashes or GPG signatures of various XML files.
42
There can be multiple equivalent representations of the same XML tree,
43
but these will have different byte-by-byte hashes.
45
Once signed files are written out, they must be stored byte-for-byte
46
and never re-encoded or renormalized, because that would break their
58
Tells people not to touch anything here.
61
Identifies the parent as a Bazaar-NG branch; contains the overall
62
branch metadata format as a string.
64
``pristine-directory``
65
Identifies that this is a pristine directory and may not be
69
Directory containing all patches applied to this branch, one per
70
file. Patches are stored as compressed deltas. We also store the
71
hash of the delta, hash of the before and after manifests, and
72
optionally a GPG signature.
75
Contains various cached data that can be destroyed and will be
76
recreated. (It should not be modified.)
79
Contains cached full trees for selected previous revisions, used
80
when generating diffs, etc.
83
Contains cached inventories of previous revisions.
86
Contains tarballs of cached revisions of the tree, named by their
87
revision id. These can also be removed, but
90
File containing the UUIDs of all patches taken in this branch,
91
in the order they were taken.
92
Each commit adds exactly one line to this file; lines are
93
never removed or reordered.
96
List of foreign patches that have been merged into this branch.
97
Must have no entries in common with ``patch-history``. Commits that
98
include merges add to this file; lines are never removed or
101
``pending-merge-patches``
102
List of foreign patches that have been merged and are waiting to be
106
User-qualified name of the branch, for the purpose of describing the
107
origin of patches, e.g. ``mbp@sourcefrog.net/distcc--main``.
110
List of branches from which we have pulled; file containing a list
111
of pairs of branch-name and location.
114
Default pull/push target.
116
``pending-inventory``
117
Mapping from UUIDs to file name in the current working directory.
120
Lock held while modifying the branch, to protect against clashing
127
Is locking a good strategy? Perhaps somekind of read-copy-update or
128
seq-lock based mechanism would work better?
130
If we do use a locking algorithm, is it OK to rely on filesystem
131
locking or do we need our own mechanism? I think most hosts should
132
have reasonable ``flock()`` or equivalent, even on NFS. One risk is
133
that on NFS it is easy to have broken locking and not know it, so it
134
might be better to have something that will fail safe.
136
Filesystem locks go away if the machine crashes or the process is
137
terminated; this can be a feature in that we do not need to deal with
138
stale locks but also a feature in that the lock itself does not
139
indicate cleanup may be needed.
141
robertc points out that tla converged on renaming a directory as a
142
mechanism: this is one thing which is known to be atomic on almost all
143
filesystems. Apparently renaming files, creating directories, making
144
symlinks etc are not good enough.
151
XML document plus a bag of patches, expressing the difference between
152
two revisions. May be a partial delta.
158
* parent directory (if any)
159
* before-name or null if new
160
* after-name or null if deleted
162
* type (dir, file, symlink, ...)
163
* patch type (patch, full-text, xdelta, ...)
170
XML document; series of entries. (Quite similar to the svn
171
``entries`` file; perhaps should even have that name.)
172
Stored identified by its hash.
174
An inventory is stored for recorded revisions, also a
175
``pending-inventory`` for a working directory.
182
XML document. Stored identified by its hash.
185
RFC-2822-style name of the committer. Should match the key used to
189
multi-line free-form text; whitespace and line breaks preserved
192
As floating-point seconds since epoch.
195
ID of the previous revision on this branch. May be absent (null) if
196
this is the start of a new branch.
199
Name of the branch to which this was originally committed.
201
(I'm not totally satisfied that this is the right way to do it; the
202
results will be a bit wierd when a series of revisions pass through
203
variously named branches.)
206
Acts as a pointer to the inventory for this revision.
209
Revision ids of complete branches merged into this revision. If a
210
revision is listed, that revision and transitively its predecessor
211
and all other merged-branches are merged. This is empty except
212
where cherry-picks have occurred.
215
Revision ids of cherry-picked patches. Patches whose branches are
216
merged need not be listed here. Listing a revision ID implies that
217
only the change of that particular revision from its predecessor has
218
been merged in. This is empty except where cherry-picks have
221
The transitive closure avoids Arch's problem of needing to list a
222
large number of previous revisions. As ddaa writes:
224
Continuation revisions (created by tla tag or baz branch) are associated
225
to a patchlog whose New-patches header lists the revisions associated to
226
all the patchlogs present in the tree. That was introduced as an
227
optimisation so the set of patchlogs in any revision could be determined
228
solely by examining the patchlogs of ancestor revisions in the same
229
branch. This behaves well as long as the total count of patchlog is
230
reasonably small or new branches are not very frequent.
232
A continuation revision on $tree currently creates a patchlog of
233
about 500K. This patchlog is present in all descendent of the revision,
234
and all revisions that merges it.
236
It may be useful at some times to keep a cache of all the branches, or
237
all the revisions, present in the history of a branch, so that we do
238
need to walk the whole history of the branch to build this list.
246
* Don't store parent-id in all revisions, but rather have <DIRECTORY>
247
nodes that contain entries for children?
249
* Assign an id to the root of the tree, perhaps listed in the top of