24
24
==================== =============================================
26
Note that for consistency we always write "indices" not "indexes".
26
28
This is implemented on top of pack files, which are written once from
27
29
start to end, then left alone. A pack consists of a body file, plus
28
30
several index files. There are four index files for each pack, which
29
31
have the same basename and an extension indicating the purpose of the
32
==== ========== ======================== ==========================
33
extn Purpose Key References
34
==== ========== ======================== ==========================
35
.tix File texts ``file_id, revision_id`` compression base,
37
.six Signatures ``revision_id,`` -
38
.rix Revisions ``revision_id,`` revision parents
39
.iix Inventory ``revision_id,`` compression base,
41
==== ========== ======================== ==========================
34
======== ========== ======================== ==========================
35
extn Purpose Key References
36
======== ========== ======================== ==========================
37
``.tix`` File texts ``file_id, revision_id`` per-file parents,
40
``.six`` Signatures ``revision_id,`` -
41
``.rix`` Revisions ``revision_id,`` revision parents
42
``.iix`` Inventory ``revision_id,`` revision parents,
44
======== ========== ======================== ==========================
46
Indices are accessed through the ``bzrlib.index.GraphIndex`` class.
43
47
Indices are stored as sorted files on disk. Each line is one record,
58
62
or inventory is a forward delta from the referenced revision. The
59
63
compression base list must have length 0 or 1.
65
Like packs, indexes are written only once and then unmodified. A
66
GraphIndex builder is a mutable in-memory graph that can be sorted,
67
cross-referenced and written out when the write group completes.
69
There can also be index entries with a value of 'a' for absent. These
70
records exist just to be pointed to in a graph. This is used, for
71
example, to give the revision-parent pointer when the parent revision is
61
74
The data content for each record is a knit data chunk. The knits are
62
75
always unannotated - the annotations must be generated when needed.
63
76
(We'd like to cache/memoize the annotations.) The data hunks can be
64
77
moved between packs without needing to recompress them.
66
There can also be index entries with a value of 'a' for absent. These
67
records exist just to be pointed to in a graph. This is used, for
68
example, to give the revision-parent pointer when the parent revision is
71
79
It is not possible to regenerate an index from the body file, because it
72
80
contains information stored in the knit index that's not in the body.
73
81
(In particular, the per-file graph is only stored in the index.)
74
We would like to change this in a future format
82
We would like to change this in a future format.
76
84
The lock is a regular LockDir lock. The lock is only held for a much
77
85
reduced scope, while updating the pack-names file. The bulk of the
101
109
packs. (This should always be the same as the list of files in the
102
110
``packs/`` directory, but the file is needed for readonly http clients
103
111
that can't easily list directories, and it includes other information.)
104
The ``pack-names`` file can be recreated by looking at the packs on disk.
105
As well as the list of names, it also contains the size in bytes of th`d
112
The constraint on the ``pack-names`` list is that every file mentioned
113
must exist in the ``packs/`` directory.
115
In rare cases, when a writer is interrupted, about-to-be-removed packs
116
may still be present in the directory but removed from the list.
118
As well as the list of names, the pack-names file also contains the
119
size, in bytes, of each of the four indices. This is used to bootstrap
120
bisection search within the indices.
107
122
In normal use, one pack will be created for each commit to a repository.
108
123
This would build up to an inefficient number of files over time, so a
121
136
initially be created in a single-revision pack, then moved to a
122
137
ten-revision pack, then to a 100-pack, and so on.
127
Sketched notes on packs
128
~~~~~~~~~~~~~~~~~~~~~~~
132
- query and pull revs
133
- query and pull references
134
- query and pull further references
137
- query and pull revs and sigs
138
- query and pull inventory with cache on
139
- query and pull all of texts
143
- two modes - complete and incremental
146
- grab all revisions, regenerate deltas, etc.
147
- may want data grouping by recreation not by
148
origin. i.e. 'tip' pack has everything to construct
149
revision -1, older packs have data that is not needed
150
for current revisions.
154
- leave most data untouched most of the time
155
- therefore can't really promote old data to be current.
158
upper number of packs = sum of digits in the commit count
159
so we need to combine packs whenever a commit rolls over a power
160
of 10 - at most once every 10 commit, possibly much less.
161
exponential backoff of commits per pack file. So the number of
162
revisions per pack is log10(revisions in repository).
163
up to 10 commits - 1 revision per pack
164
up to 100 commits - 2 revisions per pack
165
10 commits - 1 packs, 100 commits - 2 packs
167
- count number of 1-revision packs, if 10 group them
168
into one larger pack.
169
- count number of 10-unit packs. if 10, group those
170
into one larger pack.
173
this approach reads 10 entire packs every 10 commits,
174
100 every 100, 1000 every 1000
139
As with other repositories, in normal use data is only inserted.
140
However, in some circumstances we may want to garbage-collect or prune
141
existing data, or reconcile indexes.
176
143
vim: tw=72 ft=rest expandtab