2592.1.34
by Robert Collins
Cleanup docs. |
1 |
======= |
2 |
Indices |
|
3 |
======= |
|
4 |
||
5 |
Status |
|
6 |
====== |
|
7 |
||
8 |
:Date: 2007-07-14 |
|
9 |
||
10 |
This document describes the indexing facilities within bzrlib. |
|
11 |
||
12 |
||
13 |
.. contents:: |
|
14 |
||
15 |
||
16 |
Motivation |
|
17 |
========== |
|
18 |
||
19 |
To provide a clean concept of index that can be reused by different |
|
20 |
components within the codebase rather than being rewritten every time |
|
21 |
by different components. |
|
22 |
||
23 |
||
24 |
Terminology |
|
25 |
=========== |
|
26 |
||
27 |
An **index** is a dictionary mapping opaque keys to opaque values. |
|
28 |
Different index types may allow some of the value data to be interpreted |
|
29 |
by the index. For example the ``GraphIndex`` index stores a graph between |
|
30 |
keys as part of the index. |
|
31 |
||
32 |
||
33 |
Overview |
|
34 |
======== |
|
35 |
||
36 |
bzr is moving to a write-once model for repository storage in order to |
|
2592.1.44
by Robert Collins
Remove some unneeded index iteration by checking if we have found all keys, and grammar improvements from Aaron's review. |
37 |
achieve lock-free repositories eventually. In order to support this, we are |
2592.1.34
by Robert Collins
Cleanup docs. |
38 |
making our new index classes **immutable**. That is, one creates a new |
39 |
index in a single operation, and after that it is read only. To combine |
|
40 |
two indices a ``Combined*`` index may be used, or an **index merge** may |
|
41 |
be performed by reading the entire value of two (or more) indices and |
|
42 |
writing them into a new index. |
|
43 |
||
44 |
General Index API |
|
45 |
================= |
|
46 |
||
2592.1.45
by Robert Collins
Tweak documentation as per Aaron's review. |
47 |
We may end up with multiple different Index types (e.g. GraphIndex, |
48 |
Index, WhackyIndex). Even though these may require different method |
|
49 |
signatures to operate would strive to keep the signatures and return |
|
50 |
values as similar as possible. e.g.:: |
|
51 |
||
2592.1.46
by Robert Collins
Make GraphIndex accept nodes as key, value, references, so that the method |
52 |
GraphIndexBuilder - add_node(key, value, references) |
2592.1.45
by Robert Collins
Tweak documentation as per Aaron's review. |
53 |
IndexBuilder - add_node(key, value) |
2592.1.46
by Robert Collins
Make GraphIndex accept nodes as key, value, references, so that the method |
54 |
WhackyIndexBuilder - add_node(key, value, whackiness) |
2592.1.45
by Robert Collins
Tweak documentation as per Aaron's review. |
55 |
|
56 |
as opposed to something quite different like:: |
|
57 |
||
58 |
node = IncrementalBuilder.get_node() |
|
59 |
node.key = 'foo' |
|
60 |
node.value = 'bar' |
|
2592.1.34
by Robert Collins
Cleanup docs. |
61 |
|
62 |
Services |
|
63 |
-------- |
|
64 |
||
65 |
An initial implementation of indexing can probably get away with a small |
|
66 |
number of primitives. Assuming we have write once index files: |
|
67 |
||
68 |
Build index |
|
69 |
~~~~~~~~~~~ |
|
70 |
||
71 |
This should be done by creating an ``IndexBuilder`` and then calling |
|
72 |
``insert(key, value)`` many times. (Indices that support sorting, |
|
73 |
topological sorting etc, will want specialised insert methods). |
|
74 |
||
75 |
When the keys have all been added, a ``finish`` method should be called, |
|
76 |
which will return a file stream to read the index data from. |
|
77 |
||
78 |
Retrieve entries from the index |
|
79 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
80 |
||
81 |
This should allow random access to the index using readv, so we probably |
|
82 |
want to open the index on a ``Transport``, then use ``iter_entries(keys)``, |
|
83 |
which can return an iterator that yields ``(key, value)`` pairs in |
|
84 |
whatever order makes sense for the index. |
|
85 |
||
86 |
Merging of indices |
|
87 |
~~~~~~~~~~~~~~~~~~ |
|
88 |
||
89 |
Merging of N indices requires a concordance of the keys of the index. So |
|
90 |
we should offer a ``iter_all_entries`` call that has the same return type |
|
91 |
as the ``iter_entries`` call. |
|
92 |
||
93 |
Index implementations |
|
94 |
===================== |
|
95 |
||
96 |
GraphIndex |
|
97 |
---------- |
|
98 |
||
99 |
``GraphIndex`` supports graph based lookups. While currently unoptimised |
|
100 |
for reading, the index is quite space efficient at storing the revision |
|
101 |
graph index for bzr. The ``GraphIndexBuilder`` may be used to create one |
|
102 |
of these indices by calling ``add_node`` until all nodes are added, then |
|
103 |
``finish`` to obtain a file stream containing the index data. Multiple |
|
104 |
indices may be queried using the ``CombinedGraphIndex`` class. |
|
105 |
||
106 |
||
107 |
||
108 |
||
109 |
.. |
|
110 |
vim: ft=rst tw=74 ai |
|
111 |