10
This document describes the indexing facilities within bzrlib.
19
To provide a clean concept of index that can be reused by different
20
components within the codebase rather than being rewritten every time
21
by different components.
27
An **index** is a dictionary mapping opaque keys to opaque values.
28
Different index types may allow some of the value data to be interpreted
29
by the index. For example the ``GraphIndex`` index stores a graph between
30
keys as part of the index.
36
bzr is moving to a write-once model for repository storage in order to
37
achieve lock-free repositories eventually. In order to support this we are
38
making our new index classes **immutable**. That is, one creates a new
39
index in a single operation, and after that it is read only. To combine
40
two indices a ``Combined*`` index may be used, or an **index merge** may
41
be performed by reading the entire value of two (or more) indices and
42
writing them into a new index.
47
While different indices will likely require different interfaces, we
48
should keep these consistent to encourage easy adaption and reuse between
49
indices. For instance a simple key-value index should be layerable on top
50
of a more complex graph-aware index.
55
An initial implementation of indexing can probably get away with a small
56
number of primitives. Assuming we have write once index files:
61
This should be done by creating an ``IndexBuilder`` and then calling
62
``insert(key, value)`` many times. (Indices that support sorting,
63
topological sorting etc, will want specialised insert methods).
65
When the keys have all been added, a ``finish`` method should be called,
66
which will return a file stream to read the index data from.
68
Retrieve entries from the index
69
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71
This should allow random access to the index using readv, so we probably
72
want to open the index on a ``Transport``, then use ``iter_entries(keys)``,
73
which can return an iterator that yields ``(key, value)`` pairs in
74
whatever order makes sense for the index.
79
Merging of N indices requires a concordance of the keys of the index. So
80
we should offer a ``iter_all_entries`` call that has the same return type
81
as the ``iter_entries`` call.
89
``GraphIndex`` supports graph based lookups. While currently unoptimised
90
for reading, the index is quite space efficient at storing the revision
91
graph index for bzr. The ``GraphIndexBuilder`` may be used to create one
92
of these indices by calling ``add_node`` until all nodes are added, then
93
``finish`` to obtain a file stream containing the index data. Multiple
94
indices may be queried using the ``CombinedGraphIndex`` class.