~bzr-pqm/bzr/bzr.dev

Viewing changes to doc/developers/btree_index_prefetch.txt

Committer: John Arbash Meinel
Date: 2008-10-14 21:35:27 UTC
mto: This revision was merged to the branch mainline in revision 3805.
Revision ID: john@arbash-meinel.com-20081014213527-4j9uc93aq1qmn43b

Add in a shortcut when we haven't cached much yet.

Document the current algorithm more completely, including the proper
justification for the various steps.

files modified:
bzrlib/btree_index.py

bzrlib/tests/test_btree_index.py

doc/developers/btree_index_prefetch.txt

Show diffs side-by-side

added added

removed removed

doc/developers/btree_index_prefetch.txt

240

layer will often not have a full 100-way fan out. Probably not worthwhile

241

very often, though.

242

243

* Sometimes we will be making a very small request for a very small number of

244

keys, we don't really want to bloat tiny requests. Hopefully we can find a

245

decent heuristic to determine when we will be wanting extra nodes later,

246

versus when we expect to find all we want right now.

247

243

248

244

249

Algorithm

245

250

=========

246

251

247

252

This is the basic outline of the algorithm.

248

253

249

1. Expand requests by requesting neighboring pages. (So if we read page 10, we

250

can expand to also read page 9 and 11.)

251

252

2. Only expand within a layer. The problem is that with a 100:1 fan-out, but

253

only a 10:1 expansion policy, it is unlikely that we will happen to read the

254

next layer pages that we are interested in. Note that this doesn't hold true

255

when a layer has "recently split", so we may want to revisit this.

256

257

3. Part (2) hints that when reading the root node, we never request any other

258

nodes, as the root is always a layer-unto-itself. The only exception is when

259

all pages could fit in a single width request.

260

261

4. When expanding, add nodes that are "next" to the node in question, which

262

have not been read yet. This also has another interesting property. When

263

searching in a given direction, on the first request, we will pre-read both

264

directions. Future requests will only pre-read in one direction, as the

265

other direction is cached.

266

254

1. If we don't know the size of the index, don't expand as we don't know what

255

is available. (This only really applies to the pack-names file, which is

256

unlikely to ever become larger than 1 page anyway.)

257

258

2. If a request is already wide enough to be greater than the number of

259

recommended pages, don't bother trying to expand. This only really happens

260

with LocalTransport which recommends a single page.

261

262

3. Determine what pages have already been read (if any). If the pages left to

263

read can fit in a single request, just request them. This tends to happen on

264

medium sized indexes (ones with low hundreds of revisions), and near the end

265

when we've read most of the whole index already.

266

267

4. If we haven't read the root node yet, and we can't fit the whole index into

268

a single request, only read the root node. We don't know where the layer

269

boundaries are anyway.

270

271

5. If we haven't read "tree depth" pages yet, and are only requesting a single

272

new page don't expand. This is meant to handle the 'lookup 1 item in the

273

index' case. In a large pack file, you'll read only a single page at each

274

layer and then be done. When spidering out in a search, this will cause us

275

to take a little bit longer to start expanding, but once we've started we'll

276

be expanding at full velocity. This could be improved by having indexes

277

inform each other that they have already entered the 'search' phase, or by

278

having a hint from above to indicate the same.

279

280

However, remember the 'bi-modal' distribution. Most indexes will either be

281

very small, or very large. So either we'll read the whole thing quickly, or

282

we'll end up spending a lot of time in the index. Which makes a small number

283

of extra round trips to large indexes a small overhead. For 2-layer nodes,

284

this only 'wastes' one round trip.

285

286

6. Now we are ready to expand the requests. Expand by looking for more pages

287

next to the ones requested that fit within the current layer. If you run

288

into a cached page, or a layer boundary, search further only in the opposite

289

direction. This gives us proper locality of reference, and also helps

290

because when a search goes in a single direction, we will continue to

291

prefetch pages in that direction.

267

292

268

293

269

294

vim: ft=rst tw=79 ai

Older »