~bzr-pqm/bzr/bzr.dev

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
=======
Indices
=======

Status
======

:Date: 2007-07-14

This document describes the indexing facilities within bzrlib.


.. contents::


Motivation
==========

To provide a clean concept of index that can be reused by different
components within the codebase rather than being rewritten every time
by different components.


Terminology
===========

An **index** is a dictionary mapping opaque keys to opaque values.
Different index types may allow some of the value data to be interpreted
by the index. For example the ``GraphIndex`` index stores a graph between
keys as part of the index.


Overview
========

bzr is moving to a write-once model for repository storage in order to
achieve lock-free repositories eventually. In order to support this we are
making our new index classes **immutable**. That is, one creates a new
index in a single operation, and after that it is read only. To combine
two indices a ``Combined*`` index may be used, or an **index merge** may
be performed by reading the entire value of two (or more) indices and
writing them into a new index.

General Index API
=================

While different indices will likely require different interfaces, we
should keep these consistent to encourage easy adaption and reuse between
indices. For instance a simple key-value index should be layerable on top
of a more complex graph-aware index.

Services
--------

An initial implementation of indexing can probably get away with a small
number of primitives. Assuming we have write once index files:

Build index
~~~~~~~~~~~

This should be done by creating an ``IndexBuilder`` and then calling
``insert(key, value)`` many times. (Indices that support sorting,
topological sorting etc, will want specialised insert methods).

When the keys have all been added, a ``finish`` method should be called,
which will return a file stream to read the index data from.

Retrieve entries from the index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This should allow random access to the index using readv, so we probably
want to open the index on a ``Transport``, then use ``iter_entries(keys)``,
which can return an iterator that yields ``(key, value)`` pairs in
whatever order makes sense for the index.

Merging of indices
~~~~~~~~~~~~~~~~~~

Merging of N indices requires a concordance of the keys of the index. So
we should offer a ``iter_all_entries`` call that has the same return type
as the ``iter_entries`` call.

Index implementations
=====================

GraphIndex
----------

``GraphIndex`` supports graph based lookups. While currently unoptimised
for reading, the index is quite space efficient at storing the revision
graph index for bzr. The ``GraphIndexBuilder`` may be used to create one
of these indices by calling ``add_node`` until all nodes are added, then
``finish`` to obtain a file stream containing the index data. Multiple
indices may be queried using the ``CombinedGraphIndex`` class.




..
   vim: ft=rst tw=74 ai