5536.2.8
by Andrew Bennetts
Start of a developer doc describing how fetch works. |
1 |
============= |
2 |
Fetching data |
|
3 |
============= |
|
4 |
||
5 |
Overview of a fetch |
|
6 |
=================== |
|
7 |
||
8 |
Inside bzr, a typical fetch happens like this: |
|
9 |
||
10 |
* a user runs a command like ``bzr branch`` or ``bzr pull`` |
|
11 |
||
12 |
* ``Repository.fetch`` is called (by a higher-level method such as |
|
13 |
``ControlDir.sprout``, ``Branch.fetch``, etc). |
|
14 |
||
15 |
* An ``InterRepository`` object is created. The exact implementation of |
|
16 |
``InterRepository`` chosen depends on the format/capabilities of the |
|
17 |
source and target repos. |
|
18 |
||
19 |
* The source and target repositories are compared to determine which data |
|
20 |
needs to be transferred. |
|
21 |
||
22 |
* The repository data is copied. Often this is done by creating a |
|
23 |
``StreamSource`` and ``StreamSink`` from the source and target |
|
24 |
repositories and feeding the stream from the source into the sink, but |
|
25 |
some ``InterRepository`` implementations do differently. |
|
26 |
||
27 |
||
28 |
How objects to be transferred are determined |
|
29 |
============================================ |
|
30 |
||
31 |
See ``InterRepository._walk_to_common_revisions``. The basic idea is to |
|
32 |
do a breadth-first search in the source repository's revision graph |
|
33 |
(starting from the head or heads the caller asked for), and look in the |
|
34 |
target repository to see if those revisions are already present. |
|
35 |
Eventually this will find the common ancestors in both graphs, and thus |
|
36 |
the set of revisions to be copied has been identified. |
|
37 |
||
38 |
All inventories for the copied revisions need to be present (and all |
|
39 |
parent inventories at the stacking boundary too, to support stacking). |
|
40 |
||
41 |
All texts versions introduced by those inventories need to be transferred |
|
42 |
(but see also stacking constraints). |
|
43 |
||
44 |
Fetch specs |
|
45 |
=========== |
|
46 |
||
47 |
The most ``fetch`` methods accept a ``fetch_spec`` parameter. This is how |
|
48 |
the caller controls what is fetched: e.g. all revisions for a given head |
|
49 |
(that aren't already present in the target), the full ancestry for one or |
|
50 |
more heads, or even the full contents of the source repository. |
|
51 |
||
52 |
The ``fetch_spec`` parameter is an object that implements the interface |
|
53 |
defined by ``AbstractSearchResult`` in ``bzrlib.graph``. It describes |
|
54 |
which keys should be fetched. Current implementations are |
|
55 |
``SearchResult``, ``PendingAncestryResult``, ``EmptySearchResult``, and |
|
56 |
``EverythingResult``. Some have options controlling if missing revisions |
|
57 |
cause errors or not, etc. |
|
58 |
||
59 |
There are also some “search” objects, which can be used to conveniently |
|
60 |
construct a search result for common cases: ``EverythingNotInOther`` and |
|
61 |
``NotInOtherForRevs``. They provide an ``execute`` method that performs |
|
62 |
the search and returns a search result. |
|
63 |
||
64 |
Also, ``Graph._make_breadth_first_searcher`` returns an object with a |
|
65 |
``get_result`` method that returns a search result. |
|
66 |
||
67 |
||
68 |
Streams |
|
69 |
======= |
|
70 |
||
5536.2.9
by Andrew Bennetts
Describe streams a little in fetch.txt. |
71 |
A **stream** is an iterable of (substream type, substream) pairs. |
72 |
The **substream type** is a ``str`` that will be one of ``texts``, |
|
73 |
``inventories``, ``inventory-deltas``, ``chk_bytes``, ``revisions`` or |
|
74 |
``signatures``. A **substream** is a record stream. The format of those |
|
75 |
records depends on the repository format being streamed, except for |
|
76 |
``inventory-deltas`` records which are format-independent. |
|
77 |
||
78 |
A stream source can be constructed with ``repo._get_source(to_format)``, |
|
79 |
and it provides a ``get_stream(search)`` method (among others). A stream |
|
80 |
sink can be constructed with ``repo._get_sink()``, and provides an |
|
81 |
``insert_stream(stream, src_format, resume_tokens)`` method (among |
|
82 |
others). |
|
5536.2.8
by Andrew Bennetts
Start of a developer doc describing how fetch works. |
83 |
|
84 |
||
5957.1.1
by Andrew Bennetts
Add a section about stacking constraints to doc/developers/fetch.txt. |
85 |
Stacking constraints |
86 |
==================== |
|
87 |
||
5957.1.2
by Andrew Bennetts
Clarify the description of the stacking invariant based on great feedback from Vincent and John. |
88 |
**In short the rule is:** "repositories must hold revisions' parent |
89 |
inventories and their new texts (or else all texts for those revisions)." |
|
90 |
||
91 |
This is sometimes called "the stacking invariant." |
|
92 |
||
93 |
Why that rule? |
|
94 |
-------------- |
|
95 |
||
96 |
A stacked repository needs to be capable of generating a complete stream |
|
97 |
for the revisions it does hold without access to its fallback |
|
98 |
repositories [#]_. "Complete" here means that the stream for a revision (or |
|
99 |
set of revisions) can be inserted into a repository that already contains |
|
100 |
the parent(s) of that revision, and that repository will have a fully |
|
101 |
usable copy of that revision: a working tree can be built for that |
|
102 |
revision, etc. |
|
5957.1.1
by Andrew Bennetts
Add a section about stacking constraints to doc/developers/fetch.txt. |
103 |
|
104 |
Assuming for a moment the stream has the necessary inventory, signature |
|
105 |
and CHK records to have a usable revision, what texts are required to have |
|
106 |
a usable revision? The simple way to satisfy the requirement is to have |
|
107 |
*every* text for every revision at the stacking boundary. Thus the |
|
108 |
revisions at the stacking boundary and all their descendants have their |
|
109 |
texts present and so can be fully reconstructed. But this is expensive: |
|
110 |
it implies each stacked repository much contain *O(tree)* data even for a |
|
111 |
single revision of a 1-line change, and also implies transferring |
|
112 |
*O(tree)* data to fetch that revision. |
|
113 |
||
5957.1.2
by Andrew Bennetts
Clarify the description of the stacking invariant based on great feedback from Vincent and John. |
114 |
Because the goal is a usable revision *when added to a repository with the |
115 |
parent revision(s)* most of those texts will be redundant. The minimal |
|
116 |
set that is needed is just those texts that are new in the revisions in |
|
117 |
our repository. However, we need enough inventory data to be able to |
|
5957.1.1
by Andrew Bennetts
Add a section about stacking constraints to doc/developers/fetch.txt. |
118 |
determine that set of texts. So to make this possible every revision must |
119 |
have its parent inventories present so that the inventory delta between |
|
120 |
revisions can be calculated, and of course the CHK pages associated with |
|
121 |
that delta. In fact the entire inventory does not need to be present, |
|
122 |
just enough of it to find the delta (assuming a repository format, like |
|
123 |
2a, that allows only part of an inventory to be stored). Thus the stacked |
|
5957.1.2
by Andrew Bennetts
Clarify the description of the stacking invariant based on great feedback from Vincent and John. |
124 |
repository can contain only *O(changes)* data [#]_ and still deliver |
125 |
complete streams of that data. |
|
5957.1.1
by Andrew Bennetts
Add a section about stacking constraints to doc/developers/fetch.txt. |
126 |
|
127 |
What about revisions at the stacking boundary with more than one parent? |
|
5957.1.3
by Andrew Bennetts
Fix thinko in wording regarding stacking invariants and revisions with multiple parents. |
128 |
All of the parent inventories must be present, as a client may ask for a |
5957.1.1
by Andrew Bennetts
Add a section about stacking constraints to doc/developers/fetch.txt. |
129 |
stream up to any parent, not just the left-hand parent. If any parent is |
130 |
absent then all texts must be present instead. Otherwise there will be |
|
131 |
the strange situation where some fetches of a revision will succeed and |
|
132 |
others fail depending the precise details of the fetch. |
|
133 |
||
5957.1.2
by Andrew Bennetts
Clarify the description of the stacking invariant based on great feedback from Vincent and John. |
134 |
Implications for fetching |
135 |
------------------------- |
|
136 |
||
137 |
Fetches must retrieve the records necessary to satisfy that rule. The |
|
138 |
stream source will attempt to send the necessary records, and the stream |
|
139 |
sink will check for any missing records and make a second fetch for just |
|
140 |
those missing records before committing the write group. |
|
141 |
||
142 |
Our repository implementations check this constraint is satisfied before |
|
143 |
committing a write group, to prevent a bad stream from creating a corrupt |
|
144 |
repository. So a fetch from a bad source (e.g. a damaged repository, or a |
|
145 |
buggy foreign-format import) may trigger ``BzrCheckError`` during |
|
146 |
``commit_write_group``. |
|
147 |
||
148 |
To fetch from a stacked repository via a smart server, the smart client: |
|
149 |
||
150 |
* first fetches a stream of as many of the requested revisions as possible |
|
151 |
from the initial repository, |
|
152 |
* then while there are still missing revisions and untried fallback |
|
153 |
repositories fetches the outstanding revisions from the next fallback |
|
154 |
until either all revisions have been found (success) or the list of |
|
155 |
fallbacks has been exhausted (failure). |
|
156 |
||
157 |
||
158 |
.. [#] This is not just a theoretical concern. The smart server always |
|
159 |
opens repositories without opening fallbacks, as it cannot assume it |
|
160 |
can access the fallbacks that the client can. |
|
161 |
||
5957.1.1
by Andrew Bennetts
Add a section about stacking constraints to doc/developers/fetch.txt. |
162 |
.. [#] Actually *O(changes)* isn't quite right in practice. In the |
163 |
current implementation the fulltext of a changed file must be |
|
164 |
transferred, not just a delta, so a 1-line change to a 10MB file will |
|
165 |
still transfer 10MB of text data. This is because current formats |
|
166 |
require records' compression parents to be present in the same |
|
167 |
repository. |
|
168 |
||
169 |
||
5536.2.8
by Andrew Bennetts
Start of a developer doc describing how fetch works. |
170 |
.. |
171 |
vim: ft=rst tw=74 ai |