~bzr-pqm/bzr/bzr.dev

We could do this in two manners: determine revisions to sync and signatures to sync in two passes, or change the 'value' of a revision implicitly when the signature is different. E.g. by using merkle hash trees with the signature data a separate component the signatures will naturally be identified to sync.

We want to only exchange data proportional to the number of new revisions and

signatures in the system though. One way to achieve this for revisions is to

walk the graph out from the desired tips until the surface area intersection is

found. For signatures a set difference seems to be needed as there is no DAG of signatures: the presence of one has no implications on the presence of another, so a full pass over the set of signatures would be required to confirm no new signatures are needed (let alone replaced signatures).

IFF we can determine 'new revisions' and 'new signatures' without full graph access then we can scale acceptable for push and pull.

Ghosts are revisions which are not present in a particular repository. Filling ghosts refers to removing ghosts in the target repository when the ghost is present in the source repository. Filling ghosts can be either an explicit or implicit action. The common case is no ghosts.

Set synchronisation approaches

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A set synchronisation approach is one which synchronises two sets without

regard for innate structure. This can be very efficient but requires adding a

new node to be processed with every commit. Caching of the results of the

various set based syncs I've seen is possible but because the data structures

look different depending on the tip revision being synced up to the cache needs

to be very complex. I recommend not using such an approach for the common case

pull because of the failure to scale. We can use such an approach for

synchronisation of new signatures and ghosts, which should be an explicit

option in both cases.

DAG synchronisation approaches

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A DAG based approach to synchronistion is one that uses the DAG structure to

determine the difference in present nodes. It can as a result operate from the

tip of the DAG backwards. A dag based approach should allow incremental access

to data and not require a full-graph scan for incremental operations.

File level scaling

^^^^^^^^^^^^^^^^^^

We should read roughly as much of the revision level graph as is needed from

each repository to determine the node difference. If requested we should

perform a detailed scan to pick up ghost revisions and revisions which have had

signatures added. This should not be the default as it requires full history

access in both cases.

Expected file IO and access pattern:

* Common case: repo with many branches of one project, to the same.

1. Source and Target branch tips read.

2. Find the tip of each branch in their repo (will require reading some of

the revision graph but is typically near the end of the graph).

3. Read and parse increasing amounts of the revision graph until one is

found to be a subset of the other, or a complete list of revisions to be

transmitted is created.

* Uncommon cases:

1. Repositories with many projects or branches which are very old may

100

require reading a lot of unrelated graph data.

101

102

1. Initial push/pull scenarios should not require reading an entire graph.

103

104

105

API scaling

106

^^^^^^^^^^^

107

108

1. Get branch tips.

109

2. Determine one sided graph difference. To avoid obtaining a full graph over

110

the wire this needs to be done without reference to the full graph, and

111

with some logarthmic scaling algorithm. There are several already available

112

for this.

113

114

With ghost and new-signature detection:

115

116

* File IO access pattern will read the entire graph on the 'target' side - if

117

no ghosts are present then stop, otherwise seek the new revisions on the

118

source side with the regular algorithm and also explicitly search for the

119

ghost points from the target; plus a set difference search is needed on

120

signatures.

121

122

* Semantic level can probably be tuned, but as its also complex I suggest

123

deferring analysis for optimal behaviour of this use case.

124

125

126

Data reading

127

~~~~~~~~~~~~

128

129

When transferring information about a revision the graph of data for the

130

revision is walked: revision -> inventory, revision -> matching signature,

131

inventory -> file ids:revision pairs.

132

133

File level scaling

134

^^^^^^^^^^^^^^^^^^

135

136

As we're reading already committed data, as long as nothing is mutating data on

137

disk reading should be race free. We will:

138

139

- read each revision object

140

- read the matching inventory delta

141

- attempt to read a signature object

142

- parse the inventory delta

143

- read the fileid:revisionid compressed chunk for each line in the inventory

144

delta

145

146

Theres no point validating that the data read is valid, as transmission through to the client writing the data might invalidate it; we need to validate before we write.

147

148

API scaling

149

^^^^^^^^^^^

150

151

Given that we have established the revisions needed, a single API call should

152

suffice to obtain all data; the API should present the data in such an order

153

that it can be validated as it arrives and thus not require large scale

154

buffering on disk. Specifically each item of data should be validatable (e.g.

155

for some file data we want the fileid:revisionid:validationhash + content).

156

157

158

Data Verification and writing

159

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

160

161

New data written to a repository should be completed intact when it is made

162

visible. This suggests that either all the data for a revision must be made

163

atomically visible (e.g. by renaming a single file) or the leaf nodes of the

164

reference graph must become visible first.

165

166

Data is referred to via the following graph:

167

revision -> revision

168

revision -> signature

169

revision -> inventory

170

inventory -> fileid:revisionid

171

fileid:revisionid -> fileid:revisionid

172

173

Data is verifiable via a different ordering:

174

signature -> revision -> inventory -> fileid:revisionid texts.

175

176

We dont gpg verify each revision today; this analysis only speaks to hash

177

verification of contents.

178

179

To validate a revision we need to validate the data it refers to. But to

180

validate the contents of a revision we need the new texts in the inventory for

181

the revision - to check a fileid:revisionid we need to know the expected sha1

182

of the full text and thus also need to read the delta chain to construct the

183

text as we accept it to determine if its valid. Providing separate validators

184

for the chosen representation would address this.

185

e.g: For an inventory entry FILEID:REVISIONID we store the validator of the

186

full text :SHA1:. If we also stored the validator of the chosen disk

187

representation (:DELTASHA1:) we could validate the transmitted representation

188

without expanding the delta in the common case. If that failed we could expand

189

the delta chain and try against the full text validator, and finally fail. As

190

different delta generators might generate different deltas, :DELTASHA1: should

191

not become part of the revision validator, only the inventory disk encoding. In

192

a related manner a transmission format that allowed cheap validation of content

193

without applying locally stored deltas would be advantageous because no local

194

reads would be incurred to validate new content. For instance, always sending a

195

full text for any file, possibly with a delta-chain when transmitting multiple

196

revisionids of the file, would allow this. (git pack-files have this property).

197

198

Overview summary

199

^^^^^^^^^^^^^^^^

200

201

A single-file local format would allow safe atomic addition of data while

202

allowing optimisal transmission order of data. Failing this the validation of

203

data should be tuned to not require reading local texts during data addition

204

even in the presence of delta chains. We should have transmission-validators

205

separate from content validators that allow validation of the delta-transmitted

206

form of objects.

207

208

File level scaling

209

^^^^^^^^^^^^^^^^^^

210

211

* Every new file text requires transmission and local serialisation.

212

* Every commit requires transmission and storage of a revision, signature and inventory.

213

214

Thus 4000 commits to a 50000 path tree of 10 files on averages requires (with

215

knits) between 26 writes (2*(3+10)) and 80006 (2*(4000*10 + 3)) writes. In all

216

cases there are 4000 * 13 distinct objects to record.

217

218

Grouping data by fileid, content and metadata, gives the figures above.

219

Data grouping:

220

221

* File per full identifier (fileid:revisionid:meta|content): 104000

222

* Delta-chain per object: object id count * constant overhead per object id

223

(26 -> 80006)

224

* Collation/pack file: 1

225

226

Performance for these depends heavily on implementation:

227

- Using full ids we could name by validator or by id, giving best performance

228

that depends on either receiving data in validator order or in id order.

229

- using delta-chain per object we get least seek overhead and syscall overhead

230

if we recieve in topological order within the object id, and object ids in

231

lexical order.

232

- Using a collation/pack file we can stream it into place and validate as we go,

233

giving near ideal performance.

234

235

API scaling

236

^^^^^^^^^^^

237

238

The api for writing new data recieved over the network will need to be geared

239

to the transmission and local storage method. What we need is for the

240

transmission method to reasonably closely match the desired write ordering

241

locally. This suggests that once we decide on the best local storage means we

242

should design the api.

243

244

245

take N commits from A to B, if B is local then merge changes into the tree.

246

copy ebough data to recreate snapshots

247

avoid ending up wth corrupt/bad data

248

249

Notes from London

250

-----------------

251

252

#. setup

253

254

look at graph of revisions for ~N comits to deretmine eligibility for

255

if preserve mainline is on, check LH only

256

257

identify objects to send that are not on the client repo

258

- revision - may be proportional to the graph

259

- inventory - proportional to work

260

- texts - proportional to work

261

- signatures - ???

262

263

#. data transmission

264

265

* send data proportional to the new information

266

* validate the data:

267

268

#. validate the sha1 of the full text of each transmitted text.

269

#. validate the sha1:name mapping in each newly referenced inventory item.

270

#. validate the sha1 of the XML of each inventory against the revision.

271

**this is proportional to tree size and must be fixed**

272

273

#. write the data to the local repo.

274

The API should output the file texts needed by the merge as by product of the transmission

275

276

#. tree application

277

278

Combine the output from the transmission step with additional 'new work data' for anything already in the local repository that is new in this tree.

279

should write new files and stat existing files proportional to the count of the new work and the size of the full texts.

Older »