~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/developers/fetch.txt

Committer: Canonical.com Patch Queue Manager
Date: 2011-06-07 09:32:30 UTC
mfrom: (5957.1.2 doc-stacking-constraints)
Revision ID: pqm@pqm.ubuntu.com-20110607093230-uf7k8yhtbhc55m9n

(spiv) Add a section about stacking constraints to doc/developers/fetch.txt.
(Andrew Bennetts)

files modified:
doc/developers/fetch.txt

Show diffs side-by-side

added added

removed removed

doc/developers/fetch.txt

others).

Stacking constraints

====================

**In short the rule is:** "repositories must hold revisions' parent

inventories and their new texts (or else all texts for those revisions)."

This is sometimes called "the stacking invariant."

Why that rule?

--------------

A stacked repository needs to be capable of generating a complete stream

for the revisions it does hold without access to its fallback

repositories [#]_. "Complete" here means that the stream for a revision (or

set of revisions) can be inserted into a repository that already contains

100

the parent(s) of that revision, and that repository will have a fully

101

usable copy of that revision: a working tree can be built for that

102

revision, etc.

103

104

Assuming for a moment the stream has the necessary inventory, signature

105

and CHK records to have a usable revision, what texts are required to have

106

a usable revision? The simple way to satisfy the requirement is to have

107

*every* text for every revision at the stacking boundary. Thus the

108

revisions at the stacking boundary and all their descendants have their

109

texts present and so can be fully reconstructed. But this is expensive:

110

it implies each stacked repository much contain *O(tree)* data even for a

111

single revision of a 1-line change, and also implies transferring

112

*O(tree)* data to fetch that revision.

113

114

Because the goal is a usable revision *when added to a repository with the

115

parent revision(s)* most of those texts will be redundant. The minimal

116

set that is needed is just those texts that are new in the revisions in

117

our repository. However, we need enough inventory data to be able to

118

determine that set of texts. So to make this possible every revision must

119

have its parent inventories present so that the inventory delta between

120

revisions can be calculated, and of course the CHK pages associated with

121

that delta. In fact the entire inventory does not need to be present,

122

just enough of it to find the delta (assuming a repository format, like

123

2a, that allows only part of an inventory to be stored). Thus the stacked

124

repository can contain only *O(changes)* data [#]_ and still deliver

125

complete streams of that data.

126

127

What about revisions at the stacking boundary with more than one parent?

128

All of their parent revisions must be present, as a client may ask for a

129

stream up to any parent, not just the left-hand parent. If any parent is

130

absent then all texts must be present instead. Otherwise there will be

131

the strange situation where some fetches of a revision will succeed and

132

others fail depending the precise details of the fetch.

133

134

Implications for fetching

135

-------------------------

136

137

Fetches must retrieve the records necessary to satisfy that rule. The

138

stream source will attempt to send the necessary records, and the stream

139

sink will check for any missing records and make a second fetch for just

140

those missing records before committing the write group.

141

142

Our repository implementations check this constraint is satisfied before

143

committing a write group, to prevent a bad stream from creating a corrupt

144

repository. So a fetch from a bad source (e.g. a damaged repository, or a

145

buggy foreign-format import) may trigger ``BzrCheckError`` during

146

``commit_write_group``.

147

148

To fetch from a stacked repository via a smart server, the smart client:

149

150

* first fetches a stream of as many of the requested revisions as possible

151

from the initial repository,

152

* then while there are still missing revisions and untried fallback

153

repositories fetches the outstanding revisions from the next fallback

154

until either all revisions have been found (success) or the list of

155

fallbacks has been exhausted (failure).

156

157

158

.. [#] This is not just a theoretical concern. The smart server always

159

opens repositories without opening fallbacks, as it cannot assume it

160

can access the fallbacks that the client can.

161

162

.. [#] Actually *O(changes)* isn't quite right in practice. In the

163

current implementation the fulltext of a changed file must be

164

transferred, not just a delta, so a 1-line change to a 10MB file will

165

still transfer 10MB of text data. This is because current formats

166

require records' compression parents to be present in the same

167

repository.

168

169

170

171

vim: ft=rst tw=74 ai

Older »