~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/developers/indices.txt

Committer: Robert Collins
Date: 2007-07-13 15:05:36 UTC
mto: (2592.5.3 pack-repository)
mto: This revision was merged to the branch mainline in revision 2624.
Revision ID: robertc@robertcollins.net-20070713150536-hqtkufys7aiqxl1t

Cleanup docs.

files added:
doc/developers/indices.txt

files modified:
NEWS

bzrlib/index.py

doc/developers/index.txt

doc/developers/repository.txt

Show diffs side-by-side

added added

removed removed

doc/developers/indices.txt

=======

Indices

=======

Status

======

:Date: 2007-07-14

This document describes the indexing facilities within bzrlib.

.. contents::

Motivation

==========

To provide a clean concept of index that can be reused by different

components within the codebase rather than being rewritten every time

by different components.

Terminology

===========

An **index** is a dictionary mapping opaque keys to opaque values.

Different index types may allow some of the value data to be interpreted

by the index. For example the ``GraphIndex`` index stores a graph between

keys as part of the index.

Overview

========

bzr is moving to a write-once model for repository storage in order to

achieve lock-free repositories eventually. In order to support this we are

making our new index classes **immutable**. That is, one creates a new

index in a single operation, and after that it is read only. To combine

two indices a ``Combined*`` index may be used, or an **index merge** may

be performed by reading the entire value of two (or more) indices and

writing them into a new index.

General Index API

=================

While different indices will likely require different interfaces, we

should keep these consistent to encourage easy adaption and reuse between

indices. For instance a simple key-value index should be layerable on top

of a more complex graph-aware index.

Services

--------

An initial implementation of indexing can probably get away with a small

number of primitives. Assuming we have write once index files:

Build index

~~~~~~~~~~~

This should be done by creating an ``IndexBuilder`` and then calling

``insert(key, value)`` many times. (Indices that support sorting,

topological sorting etc, will want specialised insert methods).

When the keys have all been added, a ``finish`` method should be called,

which will return a file stream to read the index data from.

Retrieve entries from the index

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This should allow random access to the index using readv, so we probably

want to open the index on a ``Transport``, then use ``iter_entries(keys)``,

which can return an iterator that yields ``(key, value)`` pairs in

whatever order makes sense for the index.

Merging of indices

~~~~~~~~~~~~~~~~~~

Merging of N indices requires a concordance of the keys of the index. So

we should offer a ``iter_all_entries`` call that has the same return type

as the ``iter_entries`` call.

Index implementations

=====================

GraphIndex

----------

``GraphIndex`` supports graph based lookups. While currently unoptimised

for reading, the index is quite space efficient at storing the revision

graph index for bzr. The ``GraphIndexBuilder`` may be used to create one

of these indices by calling ``add_node`` until all nodes are added, then

``finish`` to obtain a file stream containing the index data. Multiple

indices may be queried using the ``CombinedGraphIndex`` class.

100

vim: ft=rst tw=74 ai

101

Older »