~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/developers/fetch.txt

Committer: Andrew Bennetts
Date: 2011-06-07 07:53:47 UTC
mto: This revision was merged to the branch mainline in revision 5958.
Revision ID: andrew.bennetts@canonical.com-20110607075347-dh1o6qi8729c29df

Clarify the description of the stacking invariant based on great feedback from Vincent and John.

files modified:
doc/developers/fetch.txt

Show diffs side-by-side

added added

removed removed

doc/developers/fetch.txt

Stacking constraints

====================

A stacked repository still needs to be capable of generating a complete

stream for the revisions it does hold. "Complete" here means that the

stream can be combined with data from the stacked-on repository to give a

fully usable revision: a working tree can be built for that revision, etc.

**In short the rule is:** "repositories must hold revisions' parent

inventories and their new texts (or else all texts for those revisions)."

This is sometimes called "the stacking invariant."

Why that rule?

--------------

A stacked repository needs to be capable of generating a complete stream

for the revisions it does hold without access to its fallback

repositories [#]_. "Complete" here means that the stream for a revision (or

set of revisions) can be inserted into a repository that already contains

100

the parent(s) of that revision, and that repository will have a fully

101

usable copy of that revision: a working tree can be built for that

102

revision, etc.

103

104

Assuming for a moment the stream has the necessary inventory, signature

105

and CHK records to have a usable revision, what texts are required to have

100

111

single revision of a 1-line change, and also implies transferring

101

112

*O(tree)* data to fetch that revision.

102

113

103

Because the goal is a usable revision *when combined with the stacked-on

104

repository* most of those texts will be redundant. The minimal set that

105

is needed is just those texts that are new in the revisions in our

106

repository. However, we need enough inventory data to be able to

114

Because the goal is a usable revision *when added to a repository with the

115

parent revision(s)* most of those texts will be redundant. The minimal

116

set that is needed is just those texts that are new in the revisions in

117

our repository. However, we need enough inventory data to be able to

107

118

determine that set of texts. So to make this possible every revision must

108

119

have its parent inventories present so that the inventory delta between

109

120

revisions can be calculated, and of course the CHK pages associated with

110

121

that delta. In fact the entire inventory does not need to be present,

111

122

just enough of it to find the delta (assuming a repository format, like

112

123

2a, that allows only part of an inventory to be stored). Thus the stacked

113

repository can contain only *O(changes)* data [#]_ and still deliver complete

114

streams of that data.

115

116

In short the rule is: "repositories must hold revisions' parent

117

inventories and their new texts (or else all texts for those revisions)."

118

119

Our repository implementations check this constraint is satisfied before

120

committing a write group, to prevent a bad stream from creating a corrupt

121

repository.

122

123

This means that fetches must retrieve the records necessary to satisfy

124

that rule. The stream source will attempt to transfer the necessary

125

records, and the stream sink will check for any missing records and make a

126

second fetch for just those missing records before committing the write

127

group.

124

repository can contain only *O(changes)* data [#]_ and still deliver

125

complete streams of that data.

128

126

129

127

What about revisions at the stacking boundary with more than one parent?

130

128

All of their parent revisions must be present, as a client may ask for a

133

131

the strange situation where some fetches of a revision will succeed and

134

132

others fail depending the precise details of the fetch.

135

133

134

Implications for fetching

135

-------------------------

136

137

Fetches must retrieve the records necessary to satisfy that rule. The

138

stream source will attempt to send the necessary records, and the stream

139

sink will check for any missing records and make a second fetch for just

140

those missing records before committing the write group.

141

142

Our repository implementations check this constraint is satisfied before

143

committing a write group, to prevent a bad stream from creating a corrupt

144

repository. So a fetch from a bad source (e.g. a damaged repository, or a

145

buggy foreign-format import) may trigger ``BzrCheckError`` during

146

``commit_write_group``.

147

148

To fetch from a stacked repository via a smart server, the smart client:

149

150

* first fetches a stream of as many of the requested revisions as possible

151

from the initial repository,

152

* then while there are still missing revisions and untried fallback

153

repositories fetches the outstanding revisions from the next fallback

154

until either all revisions have been found (success) or the list of

155

fallbacks has been exhausted (failure).

156

157

158

.. [#] This is not just a theoretical concern. The smart server always

159

opens repositories without opening fallbacks, as it cannot assume it

160

can access the fallbacks that the client can.

161

136

162

.. [#] Actually *O(changes)* isn't quite right in practice. In the

137

163

current implementation the fulltext of a changed file must be

138

164

transferred, not just a delta, so a 1-line change to a 10MB file will

Older »