~bzr-pqm/bzr/bzr.dev : contents of doc/developers/indices.txt at revision 2592.1.34

~bzr-pqm/bzr/bzr.dev : (revision 2592.1.34)
=======
Indices
=======

Status
======

:Date: 2007-07-14

This document describes the indexing facilities within bzrlib.


.. contents::


Motivation
==========

To provide a clean concept of index that can be reused by different
components within the codebase rather than being rewritten every time
by different components.


Terminology
===========

An **index** is a dictionary mapping opaque keys to opaque values.
Different index types may allow some of the value data to be interpreted
by the index. For example the ``GraphIndex`` index stores a graph between
keys as part of the index.


Overview
========

bzr is moving to a write-once model for repository storage in order to
achieve lock-free repositories eventually. In order to support this we are
making our new index classes **immutable**. That is, one creates a new
index in a single operation, and after that it is read only. To combine
two indices a ``Combined*`` index may be used, or an **index merge** may
be performed by reading the entire value of two (or more) indices and
writing them into a new index.

General Index API
=================

While different indices will likely require different interfaces, we
should keep these consistent to encourage easy adaption and reuse between
indices. For instance a simple key-value index should be layerable on top
of a more complex graph-aware index.

Services
--------

An initial implementation of indexing can probably get away with a small
number of primitives. Assuming we have write once index files:

Build index
~~~~~~~~~~~

This should be done by creating an ``IndexBuilder`` and then calling
``insert(key, value)`` many times. (Indices that support sorting,
topological sorting etc, will want specialised insert methods).

When the keys have all been added, a ``finish`` method should be called,
which will return a file stream to read the index data from.

Retrieve entries from the index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This should allow random access to the index using readv, so we probably
want to open the index on a ``Transport``, then use ``iter_entries(keys)``,
which can return an iterator that yields ``(key, value)`` pairs in
whatever order makes sense for the index.

Merging of indices
~~~~~~~~~~~~~~~~~~

Merging of N indices requires a concordance of the keys of the index. So
we should offer a ``iter_all_entries`` call that has the same return type
as the ``iter_entries`` call.

Index implementations
=====================

GraphIndex
----------

``GraphIndex`` supports graph based lookups. While currently unoptimised
for reading, the index is quite space efficient at storing the revision
graph index for bzr. The ``GraphIndexBuilder`` may be used to create one
of these indices by calling ``add_node`` until all nodes are added, then
``finish`` to obtain a file stream containing the index data. Multiple
indices may be queried using the ``CombinedGraphIndex`` class.




..
   vim: ft=rst tw=74 ai