3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
1 |
============================= |
2 |
Bazaar Architectural Overview |
|
3 |
============================= |
|
4 |
||
5 |
This document describes the key classes and concepts within Bazaar. It is |
|
4144.4.4
by Eric Siegerman
Line-wrap changed paragraphs. |
6 |
intended to be useful to people working on the Bazaar codebase, or to |
5225.2.13
by Martin Pool
More reorganization of the developer documentation |
7 |
people writing plugins. People writing plugins may also like to read the |
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
8 |
guide to `Integrating with Bazaar <integration.html>`_ for some specific recipes. |
9 |
||
10 |
There's some overlap between this and the `Core Concepts`_ section of the |
|
11 |
user guide, but this document is targetted to people interested in the |
|
12 |
internals. In particular the user guide doesn't go any deeper than |
|
13 |
"revision", because regular users don't care about lower-level details |
|
14 |
like inventories, but this guide does. |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
15 |
|
4144.4.4
by Eric Siegerman
Line-wrap changed paragraphs. |
16 |
If you have any questions, or if something seems to be incorrect, unclear |
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
17 |
or missing, please talk to us in ``irc://irc.freenode.net/#bzr``, write to |
18 |
the Bazaar mailing list, or simply file a bug report. |
|
19 |
||
20 |
||
21 |
IDs and keys |
|
5222.2.9
by Robert Collins
Write up some doc about bzrlib.initialize. |
22 |
############ |
23 |
||
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
24 |
IDs |
25 |
=== |
|
26 |
||
27 |
All IDs are globally unique identifiers. Inside bzrlib they are almost |
|
28 |
always represented as UTF-8 encoded bytestrings (i.e. ``str`` objects). |
|
29 |
||
30 |
The main two IDs are: |
|
31 |
||
32 |
:Revision IDs: The unique identifier of a single revision, such as |
|
33 |
``pqm@pqm.ubuntu.com-20110201161347-ao76mv267gc1b5v2`` |
|
34 |
:File IDs: The unique identifier of a single file. It is allocated when |
|
35 |
a user does ``bzr add`` and is unchanged by renames. |
|
36 |
||
37 |
By convention, in the bzrlib API, parameters of methods that are expected |
|
38 |
to be IDs (as opposed to keys, revision numbers, or some other handle) |
|
39 |
will end in ``id``, e.g. ``revid`` or ``file_id``. |
|
40 |
||
41 |
Keys |
|
42 |
==== |
|
43 |
||
44 |
A composite of one or more ID elements. E.g. a (file-id, revision-id) |
|
45 |
pair is the key to the "texts" store, but a single element key of |
|
46 |
(revision-id) is the key to the "revisions" store. |
|
5222.2.9
by Robert Collins
Write up some doc about bzrlib.initialize. |
47 |
|
48 |
||
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
49 |
Core classes |
50 |
############ |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
51 |
|
52 |
Transport |
|
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
53 |
========= |
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
54 |
|
55 |
The ``Transport`` layer handles access to local or remote directories. |
|
4144.4.3
by Eric Siegerman
Copy editing. |
56 |
Each Transport object acts as a logical connection to a particular |
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
57 |
directory, and it allows various operations on files within it. You can |
58 |
*clone* a transport to get a new Transport connected to a subdirectory or |
|
59 |
parent directory. |
|
60 |
||
61 |
Transports are not used for access to the working tree. At present |
|
62 |
working trees are always local and they are accessed through the regular |
|
4144.4.3
by Eric Siegerman
Copy editing. |
63 |
Python file I/O mechanisms. |
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
64 |
|
65 |
Filenames vs URLs |
|
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
66 |
----------------- |
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
67 |
|
4144.4.4
by Eric Siegerman
Line-wrap changed paragraphs. |
68 |
Transports work in terms of URLs. Take note that URLs are by definition |
69 |
only ASCII - the decision of how to encode a Unicode string into a URL |
|
70 |
must be taken at a higher level, typically in the Store. (Note that |
|
71 |
Stores also escape filenames which cannot be safely stored on all |
|
72 |
filesystems, but this is a different level.) |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
73 |
|
74 |
The main reason for this is that it's not possible to safely roundtrip a |
|
75 |
URL into Unicode and then back into the same URL. The URL standard |
|
76 |
gives a way to represent non-ASCII bytes in ASCII (as %-escapes), but |
|
77 |
doesn't say how those bytes represent non-ASCII characters. (They're not |
|
78 |
guaranteed to be UTF-8 -- that is common but doesn't happen everywhere.) |
|
79 |
||
4144.4.3
by Eric Siegerman
Copy editing. |
80 |
For example, if the user enters the URL ``http://example/%e0``, there's no |
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
81 |
way to tell whether that character represents "latin small letter a with |
4144.4.3
by Eric Siegerman
Copy editing. |
82 |
grave" in iso-8859-1, or "latin small letter r with acute" in iso-8859-2, |
83 |
or malformed UTF-8. So we can't convert the URL to Unicode reliably. |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
84 |
|
4144.4.4
by Eric Siegerman
Line-wrap changed paragraphs. |
85 |
Equally problematic is if we're given a URL-like string containing |
86 |
(unescaped) non-ASCII characters (such as the accented a). We can't be |
|
87 |
sure how to convert that to a valid (i.e. ASCII-only) URL, because we |
|
88 |
don't know what encoding the server expects for those characters. |
|
89 |
(Although it is not totally reliable, we might still accept these and |
|
90 |
assume that they should be put into UTF-8.) |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
91 |
|
4144.4.2
by Eric Siegerman
Uppercase acronyms. |
92 |
A similar edge case is that the URL ``http://foo/sweet%2Fsour`` contains |
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
93 |
one directory component whose name is "sweet/sour". The escaped slash is |
4144.4.4
by Eric Siegerman
Line-wrap changed paragraphs. |
94 |
not a directory separator, but if we try to convert the URL to a regular |
95 |
Unicode path, this information will be lost. |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
96 |
|
4144.4.3
by Eric Siegerman
Copy editing. |
97 |
This implies that Transports must natively deal with URLs. For simplicity |
98 |
they *only* deal with URLs; conversion of other strings to URLs is done |
|
4144.4.4
by Eric Siegerman
Line-wrap changed paragraphs. |
99 |
elsewhere. Information that Transports return, such as from ``list_dir``, |
100 |
is also in the form of URL components. |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
101 |
|
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
102 |
More information |
103 |
---------------- |
|
104 |
||
105 |
See also: |
|
106 |
||
107 |
* `Developer guide to bzrlib transports <transports.html>`_ |
|
108 |
* API docs for ``bzrlib.transport.Transport`` |
|
109 |
||
110 |
Tree |
|
111 |
==== |
|
112 |
||
113 |
A representation of a directory of files (and other directories and |
|
114 |
symlinks etc). The most important kinds of Tree are: |
|
115 |
||
116 |
:WorkingTree: the files on disk editable by the user |
|
117 |
:RevisionTree: a tree as recorded at some point in the past |
|
118 |
||
119 |
Trees can map file paths to file-ids and vice versa (although trees such |
|
120 |
as WorkingTree may have unversioned files not described in that mapping). |
|
121 |
Trees have an inventory and parents (an ordered list of zero or more |
|
122 |
revision IDs). |
|
123 |
||
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
124 |
|
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
125 |
WorkingTree |
126 |
=========== |
|
127 |
||
128 |
A workingtree is a special type of Tree that's associated with a working |
|
129 |
directory on disk, where the user can directly modify the files. |
|
130 |
||
131 |
Responsibilities: |
|
132 |
||
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
133 |
* Maintaining a WorkingTree on disk at a file path. |
134 |
* Maintaining the basis inventory (the inventory of the last commit done) |
|
135 |
* Maintaining the working inventory. |
|
136 |
* Maintaining the pending merges list. |
|
137 |
* Maintaining the stat cache. |
|
138 |
* Maintaining the last revision the working tree was updated to. |
|
139 |
* Knows where its Branch is located. |
|
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
140 |
|
141 |
Dependencies: |
|
142 |
||
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
143 |
* a Branch |
144 |
* an MutableInventory |
|
145 |
* local access to the working tree |
|
146 |
||
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
147 |
|
148 |
Branch |
|
149 |
====== |
|
150 |
||
151 |
A Branch is a key user concept - its a single line of history that one or |
|
152 |
more people have been committing to. |
|
153 |
||
154 |
A Branch is responsible for: |
|
155 |
||
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
156 |
* Holding user preferences that are set in a Branch. |
157 |
* Holding the 'tip': the last revision to be committed to this Branch. |
|
158 |
(And the revno of that revision.) |
|
159 |
* Knowing how to open the Repository that holds its history. |
|
160 |
* Allowing write locks to be taken out to prevent concurrent alterations to the branch. |
|
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
161 |
|
162 |
Depends on: |
|
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
163 |
|
164 |
* URL access to its base directory. |
|
165 |
* A Transport to access its files. |
|
166 |
* A Repository to hold its history. |
|
167 |
||
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
168 |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
169 |
Repository |
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
170 |
========== |
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
171 |
|
172 |
Repositories store committed history: file texts, revisions, inventories, |
|
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
173 |
and graph relationships between them. A repository holds a bag of |
174 |
revision data that can be pointed to by various branches: |
|
175 |
||
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
176 |
* Maintains storage of various history data at a URL: |
177 |
||
178 |
* Revisions (Must have a matching inventory) |
|
179 |
* Digital Signatures |
|
180 |
* Inventories for each Revision. (Must have all the file texts available). |
|
181 |
* File texts |
|
182 |
||
183 |
* Synchronizes concurrent access to the repository by different |
|
184 |
processes. (Most repository implementations use a physical mutex only |
|
185 |
for a short period, and effectively support multiple readers and |
|
186 |
writers.) |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
187 |
|
3683.1.2
by Martin Pool
Developer documentation of repository stacking |
188 |
Stacked Repositories |
5225.2.14
by Martin Pool
Move core class documentation from the wiki into the developer docs |
189 |
-------------------- |
3683.1.2
by Martin Pool
Developer documentation of repository stacking |
190 |
|
4144.4.3
by Eric Siegerman
Copy editing. |
191 |
A repository can be configured to refer to a list of "fallback" |
3683.1.2
by Martin Pool
Developer documentation of repository stacking |
192 |
repositories. If a particular revision is not present in the original |
193 |
repository, it refers the query to the fallbacks. |
|
194 |
||
195 |
Compression deltas don't span physical repository boundaries. So the |
|
4144.4.3
by Eric Siegerman
Copy editing. |
196 |
first commit to a new, empty repository with fallback repositories will |
3683.1.2
by Martin Pool
Developer documentation of repository stacking |
197 |
store a full text of the inventory, and of every new file text. |
198 |
||
199 |
At runtime, repository stacking is actually configured by the branch, not |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
200 |
the repository. So doing ``a_bzrdir.open_repository()`` |
201 |
gets you just the single physical repository, while |
|
202 |
``a_bzrdir.open_branch().repository`` gets one configured with a stacking. |
|
4144.4.3
by Eric Siegerman
Copy editing. |
203 |
Therefore, to permanently change the fallback repository stored on disk, |
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
204 |
you must use ``Branch.set_stacked_on_url``. |
3683.1.2
by Martin Pool
Developer documentation of repository stacking |
205 |
|
4144.4.3
by Eric Siegerman
Copy editing. |
206 |
Changing away from an existing stacked-on URL will copy across any |
3683.1.2
by Martin Pool
Developer documentation of repository stacking |
207 |
necessary history so that the repository remains usable. |
208 |
||
4144.4.2
by Eric Siegerman
Uppercase acronyms. |
209 |
A repository opened from an HPSS server is never stacked on the server |
3683.1.2
by Martin Pool
Developer documentation of repository stacking |
210 |
side, because this could cause complexity or security problems with the |
211 |
server acting as a proxy for the client. Instead, the branch on the |
|
212 |
server exposes the stacked-on URL and the client can open that. |
|
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
213 |
|
214 |
||
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
215 |
Storage model |
216 |
############# |
|
217 |
||
218 |
This section describes the model for how bzr stores its data. The |
|
219 |
representation of that data on disk varies considerable depending on the |
|
220 |
format of the repository (and to a lesser extent the format of the branch |
|
221 |
and working tree), but ultimately the set of objects being represented is |
|
222 |
the same. |
|
223 |
||
224 |
Branch |
|
225 |
====== |
|
226 |
||
227 |
A branch directly contains: |
|
228 |
||
229 |
* the ID of the current revision that branch (a.k.a. the “tip”) |
|
230 |
* some settings for that branch (the values in “branch.conf”) |
|
231 |
* the set of tags for that branch (not supported in all formats) |
|
232 |
||
233 |
A branch implicitly references: |
|
234 |
||
235 |
* A repository. The repository might be colocated in the same directory |
|
236 |
as the branch, or it might be somewhere else entirely. |
|
237 |
||
238 |
||
239 |
Repository |
|
240 |
========== |
|
241 |
||
242 |
A repository contains: |
|
243 |
||
244 |
* a revision store |
|
245 |
* an inventory store |
|
246 |
* a text store |
|
247 |
* a signature store |
|
248 |
||
249 |
A store is a key-value mapping. This says nothing about the layout on |
|
250 |
disk, just that conceptually there are distinct stores, each with a |
|
251 |
separate namespace for the keys. Internally the repository may serialize |
|
252 |
stores in the same file, and/or e.g. apply compression algorithms that |
|
253 |
combine records from separate stores in one block, etc. |
|
254 |
||
255 |
You can consider the repository as a single key space, with keys that look |
|
256 |
like *(store-name, ...)*. For example, *('revisions', |
|
257 |
revision-id)* or *('texts', revision-id, file-id)*. |
|
258 |
||
259 |
Revision store |
|
260 |
-------------- |
|
261 |
||
262 |
Stores revision objects. The keys are GUIDs. The value is a revision |
|
263 |
object (the exact representation on disk depends on the repository |
|
264 |
format). |
|
265 |
||
266 |
As described in `Core Concepts`_ a revision describes a snapshot of the |
|
267 |
tree of files and some metadata about them. |
|
268 |
||
269 |
* metadata: |
|
270 |
||
271 |
* parent revisions (an ordered sequence of zero or more revision IDs) |
|
272 |
* commit message |
|
273 |
* author(s) |
|
274 |
* timestamp |
|
275 |
* (and all other revision properties) |
|
276 |
||
277 |
* an inventory ID (that inventory describes the tree contents). Is often |
|
278 |
the same as the revision ID, but doesn't have to be (e.g. if no files |
|
279 |
were changed between two revisions then both revisions will refer to |
|
280 |
the same inventory). |
|
281 |
||
282 |
||
283 |
Inventory store |
|
284 |
--------------- |
|
285 |
||
286 |
Stores inventory objects. The keys are GUIDs. (Footnote: there will |
|
287 |
usually be a revision with the same key in the revision store, but there |
|
288 |
are rare cases where this is not true.) |
|
289 |
||
290 |
An inventory object contains: |
|
291 |
||
292 |
* a set of inventory entries |
|
293 |
||
294 |
An inventory entry has the following attributes |
|
295 |
||
296 |
* a file-id (a GUID, or the special value TREE_ROOT for the root entry of |
|
297 |
inventories created by older versions of bzr) |
|
298 |
* a revision-id, a GUID (generally corresponding to the ID of a |
|
299 |
revision). The combination of (file-id, revision-id) is a key into the |
|
300 |
texts store. |
|
301 |
* a kind: one of file, directory, symlink, tree-reference (tree-reference |
|
302 |
is only supported in unsupported developer formats) |
|
303 |
* parent-id: the file-id of the directory that contains this entry (this |
|
304 |
value is unset for the root of the tree). |
|
305 |
* name: the name of the file/directory/etc in that parent directory |
|
306 |
* executable: a flag indicating if the executable bit is set for that |
|
307 |
file. |
|
308 |
||
309 |
An inventory entry will have other attributes, depending on the kind: |
|
310 |
||
311 |
* file: |
|
312 |
||
313 |
* SHA1 |
|
314 |
* size |
|
315 |
||
316 |
* directory |
|
317 |
||
318 |
* children |
|
319 |
||
320 |
* symlink |
|
321 |
||
322 |
* symlink_target |
|
323 |
||
324 |
* tree-reference |
|
325 |
||
326 |
* reference_revision |
|
327 |
||
5877.1.1
by Jonathan Riddell
fix inventory.html link |
328 |
For some more details see `Inventories <inventory.html>`_. |
5641.2.1
by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little. |
329 |
|
330 |
||
331 |
Texts store |
|
332 |
----------- |
|
333 |
||
334 |
Stores the contents of individual versions of files. The keys are pairs |
|
335 |
of (file-id, revision-id), and the values are the full content (or |
|
336 |
"text") of a version of a file. |
|
337 |
||
338 |
For consistency/simplicity text records exist for all inventory entries, |
|
339 |
but in general only entries with of kind "file" have interesting records. |
|
340 |
||
341 |
||
342 |
Signature store |
|
343 |
--------------- |
|
344 |
||
345 |
Stores cryptographic signatures of revision contents. The keys match |
|
346 |
those of the revision store. |
|
347 |
||
348 |
.. _Core Concepts: http://doc.bazaar.canonical.com/latest/en/user-guide/core_concepts.html |
|
349 |
||
3683.1.1
by Martin Pool
Improved review process docs and separate out architectural overview |
350 |
.. |
351 |
vim: ft=rst tw=74 ai |