240
240
layer will often not have a full 100-way fan out. Probably not worthwhile
241
241
very often, though.
243
* Sometimes we will be making a very small request for a very small number of
244
keys, we don't really want to bloat tiny requests. Hopefully we can find a
245
decent heuristic to determine when we will be wanting extra nodes later,
246
versus when we expect to find all we want right now.
247
252
This is the basic outline of the algorithm.
249
1. Expand requests by requesting neighboring pages. (So if we read page 10, we
250
can expand to also read page 9 and 11.)
252
2. Only expand within a layer. The problem is that with a 100:1 fan-out, but
253
only a 10:1 expansion policy, it is unlikely that we will happen to read the
254
next layer pages that we are interested in. Note that this doesn't hold true
255
when a layer has "recently split", so we may want to revisit this.
257
3. Part (2) hints that when reading the root node, we never request any other
258
nodes, as the root is always a layer-unto-itself. The only exception is when
259
all pages could fit in a single width request.
261
4. When expanding, add nodes that are "next" to the node in question, which
262
have not been read yet. This also has another interesting property. When
263
searching in a given direction, on the first request, we will pre-read both
264
directions. Future requests will only pre-read in one direction, as the
265
other direction is cached.
254
1. If we don't know the size of the index, don't expand as we don't know what
255
is available. (This only really applies to the pack-names file, which is
256
unlikely to ever become larger than 1 page anyway.)
258
2. If a request is already wide enough to be greater than the number of
259
recommended pages, don't bother trying to expand. This only really happens
260
with LocalTransport which recommends a single page.
262
3. Determine what pages have already been read (if any). If the pages left to
263
read can fit in a single request, just request them. This tends to happen on
264
medium sized indexes (ones with low hundreds of revisions), and near the end
265
when we've read most of the whole index already.
267
4. If we haven't read the root node yet, and we can't fit the whole index into
268
a single request, only read the root node. We don't know where the layer
269
boundaries are anyway.
271
5. If we haven't read "tree depth" pages yet, and are only requesting a single
272
new page don't expand. This is meant to handle the 'lookup 1 item in the
273
index' case. In a large pack file, you'll read only a single page at each
274
layer and then be done. When spidering out in a search, this will cause us
275
to take a little bit longer to start expanding, but once we've started we'll
276
be expanding at full velocity. This could be improved by having indexes
277
inform each other that they have already entered the 'search' phase, or by
278
having a hint from above to indicate the same.
280
However, remember the 'bi-modal' distribution. Most indexes will either be
281
very small, or very large. So either we'll read the whole thing quickly, or
282
we'll end up spending a lot of time in the index. Which makes a small number
283
of extra round trips to large indexes a small overhead. For 2-layer nodes,
284
this only 'wastes' one round trip.
286
6. Now we are ready to expand the requests. Expand by looking for more pages
287
next to the ones requested that fit within the current layer. If you run
288
into a cached page, or a layer boundary, search further only in the opposite
289
direction. This gives us proper locality of reference, and also helps
290
because when a search goes in a single direction, we will continue to
291
prefetch pages in that direction.
269
294
vim: ft=rst tw=79 ai