~bzr-pqm/bzr/bzr.dev

3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
1
=============================
2
Bazaar Architectural Overview
3
=============================
4
5
This document describes the key classes and concepts within Bazaar.  It is
4144.4.4 by Eric Siegerman
Line-wrap changed paragraphs.
6
intended to be useful to people working on the Bazaar codebase, or to
5225.2.13 by Martin Pool
More reorganization of the developer documentation
7
people writing plugins.  People writing plugins may also like to read the 
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
8
guide to `Integrating with Bazaar <integration.html>`_ for some specific recipes.
9
10
There's some overlap between this and the `Core Concepts`_ section of the
11
user guide, but this document is targetted to people interested in the
12
internals.  In particular the user guide doesn't go any deeper than
13
"revision", because regular users don't care about lower-level details
14
like inventories, but this guide does.
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
15
4144.4.4 by Eric Siegerman
Line-wrap changed paragraphs.
16
If you have any questions, or if something seems to be incorrect, unclear
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
17
or missing, please talk to us in ``irc://irc.freenode.net/#bzr``, write to
18
the Bazaar mailing list, or simply file a bug report.
19
20
21
IDs and keys
5222.2.9 by Robert Collins
Write up some doc about bzrlib.initialize.
22
############
23
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
24
IDs
25
===
26
27
All IDs are globally unique identifiers.  Inside bzrlib they are almost
28
always represented as UTF-8 encoded bytestrings (i.e. ``str`` objects).
29
30
The main two IDs are:
31
32
:Revision IDs: The unique identifier of a single revision, such as
33
  ``pqm@pqm.ubuntu.com-20110201161347-ao76mv267gc1b5v2``
34
:File IDs: The unique identifier of a single file.  It is allocated when
35
  a user does ``bzr add`` and is unchanged by renames.
36
37
By convention, in the bzrlib API, parameters of methods that are expected
38
to be IDs (as opposed to keys, revision numbers, or some other handle)
39
will end in ``id``, e.g.  ``revid`` or ``file_id``.
40
41
Keys
42
====
43
44
A composite of one or more ID elements.  E.g. a (file-id, revision-id)
45
pair is the key to the "texts" store, but a single element key of
46
(revision-id) is the key to the "revisions" store.
5222.2.9 by Robert Collins
Write up some doc about bzrlib.initialize.
47
48
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
49
Core classes
50
############
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
51
52
Transport
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
53
=========
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
54
55
The ``Transport`` layer handles access to local or remote directories.
4144.4.3 by Eric Siegerman
Copy editing.
56
Each Transport object acts as a logical connection to a particular
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
57
directory, and it allows various operations on files within it.  You can
58
*clone* a transport to get a new Transport connected to a subdirectory or
59
parent directory.
60
61
Transports are not used for access to the working tree.  At present
62
working trees are always local and they are accessed through the regular
4144.4.3 by Eric Siegerman
Copy editing.
63
Python file I/O mechanisms.
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
64
65
Filenames vs URLs
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
66
-----------------
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
67
4144.4.4 by Eric Siegerman
Line-wrap changed paragraphs.
68
Transports work in terms of URLs.  Take note that URLs are by definition
69
only ASCII - the decision of how to encode a Unicode string into a URL
70
must be taken at a higher level, typically in the Store.  (Note that
71
Stores also escape filenames which cannot be safely stored on all
72
filesystems, but this is a different level.)
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
73
74
The main reason for this is that it's not possible to safely roundtrip a
75
URL into Unicode and then back into the same URL.  The URL standard
76
gives a way to represent non-ASCII bytes in ASCII (as %-escapes), but
77
doesn't say how those bytes represent non-ASCII characters.  (They're not
78
guaranteed to be UTF-8 -- that is common but doesn't happen everywhere.)
79
4144.4.3 by Eric Siegerman
Copy editing.
80
For example, if the user enters the URL ``http://example/%e0``, there's no
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
81
way to tell whether that character represents "latin small letter a with
4144.4.3 by Eric Siegerman
Copy editing.
82
grave" in iso-8859-1, or "latin small letter r with acute" in iso-8859-2,
83
or malformed UTF-8.  So we can't convert the URL to Unicode reliably.
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
84
4144.4.4 by Eric Siegerman
Line-wrap changed paragraphs.
85
Equally problematic is if we're given a URL-like string containing
86
(unescaped) non-ASCII characters (such as the accented a).  We can't be
87
sure how to convert that to a valid (i.e. ASCII-only) URL, because we
88
don't know what encoding the server expects for those characters.
89
(Although it is not totally reliable, we might still accept these and
90
assume that they should be put into UTF-8.)
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
91
4144.4.2 by Eric Siegerman
Uppercase acronyms.
92
A similar edge case is that the URL ``http://foo/sweet%2Fsour`` contains
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
93
one directory component whose name is "sweet/sour".  The escaped slash is
4144.4.4 by Eric Siegerman
Line-wrap changed paragraphs.
94
not a directory separator, but if we try to convert the URL to a regular
95
Unicode path, this information will be lost.
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
96
4144.4.3 by Eric Siegerman
Copy editing.
97
This implies that Transports must natively deal with URLs.  For simplicity
98
they *only* deal with URLs; conversion of other strings to URLs is done
4144.4.4 by Eric Siegerman
Line-wrap changed paragraphs.
99
elsewhere.  Information that Transports return, such as from ``list_dir``,
100
is also in the form of URL components.
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
101
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
102
More information
103
----------------
104
105
See also:
106
107
* `Developer guide to bzrlib transports <transports.html>`_ 
108
* API docs for ``bzrlib.transport.Transport``
109
110
Tree
111
====
112
113
A representation of a directory of files (and other directories and
114
symlinks etc).  The most important kinds of Tree are:
115
116
:WorkingTree: the files on disk editable by the user
117
:RevisionTree: a tree as recorded at some point in the past
118
119
Trees can map file paths to file-ids and vice versa (although trees such
120
as WorkingTree may have unversioned files not described in that mapping).
121
Trees have an inventory and parents (an ordered list of zero or more
122
revision IDs).
123
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
124
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
125
WorkingTree
126
===========
127
128
A workingtree is a special type of Tree that's associated with a working
129
directory on disk, where the user can directly modify the files. 
130
131
Responsibilities:
132
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
133
* Maintaining a WorkingTree on disk at a file path.
134
* Maintaining the basis inventory (the inventory of the last commit done)
135
* Maintaining the working inventory.
136
* Maintaining the pending merges list.
137
* Maintaining the stat cache.
138
* Maintaining the last revision the working tree was updated to.
139
* Knows where its Branch is located.
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
140
141
Dependencies:
142
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
143
* a Branch
144
* an MutableInventory
145
* local access to the working tree
146
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
147
148
Branch
149
======
150
151
A Branch is a key user concept - its a single line of history that one or
152
more people have been committing to. 
153
154
A Branch is responsible for:
155
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
156
* Holding user preferences that are set in a Branch.
157
* Holding the 'tip': the last revision to be committed to this Branch.
158
  (And the revno of that revision.)
159
* Knowing how to open the Repository that holds its history.
160
* Allowing write locks to be taken out to prevent concurrent alterations to the branch.
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
161
162
Depends on:
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
163
164
* URL access to its base directory.
165
* A Transport to access its files.
166
* A Repository to hold its history.
167
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
168
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
169
Repository
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
170
==========
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
171
172
Repositories store committed history: file texts, revisions, inventories,
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
173
and graph relationships between them.  A repository holds a bag of
174
revision data that can be pointed to by various branches:
175
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
176
* Maintains storage of various history data at a URL:
177
  
178
  * Revisions (Must have a matching inventory)
179
  * Digital Signatures
180
  * Inventories for each Revision. (Must have all the file texts available).
181
  * File texts
182
183
* Synchronizes concurrent access to the repository by different
184
  processes.  (Most repository implementations use a physical mutex only
185
  for a short period, and effectively support multiple readers and
186
  writers.)
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
187
3683.1.2 by Martin Pool
Developer documentation of repository stacking
188
Stacked Repositories
5225.2.14 by Martin Pool
Move core class documentation from the wiki into the developer docs
189
--------------------
3683.1.2 by Martin Pool
Developer documentation of repository stacking
190
4144.4.3 by Eric Siegerman
Copy editing.
191
A repository can be configured to refer to a list of "fallback"
3683.1.2 by Martin Pool
Developer documentation of repository stacking
192
repositories.  If a particular revision is not present in the original
193
repository, it refers the query to the fallbacks.
194
195
Compression deltas don't span physical repository boundaries.  So the
4144.4.3 by Eric Siegerman
Copy editing.
196
first commit to a new, empty repository with fallback repositories will
3683.1.2 by Martin Pool
Developer documentation of repository stacking
197
store a full text of the inventory, and of every new file text.
198
199
At runtime, repository stacking is actually configured by the branch, not
4853.1.1 by Patrick Regan
Removed trailing whitespace from files in doc directory
200
the repository.  So doing ``a_bzrdir.open_repository()``
201
gets you just the single physical repository, while
202
``a_bzrdir.open_branch().repository`` gets one configured with a stacking.
4144.4.3 by Eric Siegerman
Copy editing.
203
Therefore, to permanently change the fallback repository stored on disk,
4853.1.1 by Patrick Regan
Removed trailing whitespace from files in doc directory
204
you must use ``Branch.set_stacked_on_url``.
3683.1.2 by Martin Pool
Developer documentation of repository stacking
205
4144.4.3 by Eric Siegerman
Copy editing.
206
Changing away from an existing stacked-on URL will copy across any
3683.1.2 by Martin Pool
Developer documentation of repository stacking
207
necessary history so that the repository remains usable.
208
4144.4.2 by Eric Siegerman
Uppercase acronyms.
209
A repository opened from an HPSS server is never stacked on the server
3683.1.2 by Martin Pool
Developer documentation of repository stacking
210
side, because this could cause complexity or security problems with the
211
server acting as a proxy for the client.  Instead, the branch on the
212
server exposes the stacked-on URL and the client can open that.
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
213
214
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
215
Storage model
216
#############
217
218
This section describes the model for how bzr stores its data.  The
219
representation of that data on disk varies considerable depending on the
220
format of the repository (and to a lesser extent the format of the branch
221
and working tree), but ultimately the set of objects being represented is
222
the same.
223
224
Branch
225
======
226
227
A branch directly contains:
228
229
* the ID of the current revision that branch (a.k.a. the “tip”)
230
* some settings for that branch (the values in “branch.conf”)
231
* the set of tags for that branch (not supported in all formats)
232
233
A branch implicitly references:
234
235
* A repository.  The repository might be colocated in the same directory
236
  as the branch, or it might be somewhere else entirely.
237
238
239
Repository
240
==========
241
242
A repository contains:
243
244
* a revision store
245
* an inventory store
246
* a text store
247
* a signature store
248
249
A store is a key-value mapping.  This says nothing about the layout on
250
disk, just that conceptually there are distinct stores, each with a
251
separate namespace for the keys.  Internally the repository may serialize
252
stores in the same file, and/or e.g. apply compression algorithms that
253
combine records from separate stores in one block, etc.
254
255
You can consider the repository as a single key space, with keys that look
256
like *(store-name, ...)*.  For example, *('revisions',
257
revision-id)* or *('texts', revision-id, file-id)*.
258
259
Revision store
260
--------------
261
262
Stores revision objects.  The keys are GUIDs.  The value is a revision
263
object (the exact representation on disk depends on the repository
264
format).
265
266
As described in `Core Concepts`_ a revision describes a snapshot of the
267
tree of files and some metadata about them.
268
269
* metadata:
270
271
  * parent revisions (an ordered sequence of zero or more revision IDs)
272
  * commit message
273
  * author(s)
274
  * timestamp
275
  * (and all other revision properties)
276
277
* an inventory ID (that inventory describes the tree contents).  Is often
278
  the same as the revision ID, but doesn't have to be (e.g. if no files
279
  were changed between two revisions then both revisions will refer to
280
  the same inventory).
281
282
283
Inventory store
284
---------------
285
286
Stores inventory objects.  The keys are GUIDs.  (Footnote: there will
287
usually be a revision with the same key in the revision store, but there
288
are rare cases where this is not true.)
289
290
An inventory object contains:
291
292
* a set of inventory entries
293
294
An inventory entry has the following attributes
295
296
* a file-id (a GUID, or the special value TREE_ROOT for the root entry of
297
  inventories created by older versions of bzr)
298
* a revision-id, a GUID (generally corresponding to the ID of a
299
  revision).  The combination of (file-id, revision-id) is a key into the
300
  texts store.
301
* a kind: one of file, directory, symlink, tree-reference (tree-reference
302
  is only supported in unsupported developer formats)
303
* parent-id: the file-id of the directory that contains this entry (this
304
  value is unset for the root of the tree).
305
* name: the name of the file/directory/etc in that parent directory
306
* executable: a flag indicating if the executable bit is set for that
307
  file.
308
309
An inventory entry will have other attributes, depending on the kind:
310
311
* file:
312
313
  * SHA1
314
  * size
315
316
* directory
317
318
  * children
319
320
* symlink
321
322
  * symlink_target
323
324
* tree-reference
325
326
  * reference_revision
327
5877.1.1 by Jonathan Riddell
fix inventory.html link
328
For some more details see `Inventories <inventory.html>`_.
5641.2.1 by Andrew Bennetts
Fix some formatting nits in doc/developers/overview.txt, and expand the content a little.
329
330
331
Texts store
332
-----------
333
334
Stores the contents of individual versions of files.  The keys are pairs
335
of (file-id, revision-id), and the values are the full content (or
336
"text") of a version of a file.
337
338
For consistency/simplicity text records exist for all inventory entries,
339
but in general only entries with of kind "file" have interesting records.
340
341
342
Signature store
343
---------------
344
345
Stores cryptographic signatures of revision contents.  The keys match
346
those of the revision store.
347
348
.. _Core Concepts: http://doc.bazaar.canonical.com/latest/en/user-guide/core_concepts.html
349
3683.1.1 by Martin Pool
Improved review process docs and separate out architectural overview
350
..
351
   vim: ft=rst tw=74 ai