~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/developers/knitpack.txt

Committer: Martin Pool
Date: 2007-10-24 02:14:11 UTC
mto: This revision was merged to the branch mainline in revision 2933.
Revision ID: mbp@sourcefrog.net-20071024021411-nm686fr2koy4wao2

Review comments on knitpack docs

files modified:
doc/developers/knitpack.txt

Show diffs side-by-side

added added

removed removed

doc/developers/knitpack.txt

lock/ lockdir

==================== =============================================

Note that for consistency we always write "indices" not "indexes".

This is implemented on top of pack files, which are written once from

start to end, then left alone. A pack consists of a body file, plus

several index files. There are four index files for each pack, which

have the same basename and an extension indicating the purpose of the

index:

==== ========== ======================== ==========================

extn Purpose Key References

==== ========== ======================== ==========================

.tix File texts ``file_id, revision_id`` compression base,

per-file parents

.six Signatures ``revision_id,`` -

.rix Revisions ``revision_id,`` revision parents

.iix Inventory ``revision_id,`` compression base,

revision parents

==== ========== ======================== ==========================

======== ========== ======================== ==========================

extn Purpose Key References

======== ========== ======================== ==========================

``.tix`` File texts ``file_id, revision_id`` per-file parents,

compression basis

per-file parents

``.six`` Signatures ``revision_id,`` -

``.rix`` Revisions ``revision_id,`` revision parents

``.iix`` Inventory ``revision_id,`` revision parents,

compression base

======== ========== ======================== ==========================

Indices are accessed through the ``bzrlib.index.GraphIndex`` class.

Indices are stored as sorted files on disk. Each line is one record,

and contains:

or inventory is a forward delta from the referenced revision. The

compression base list must have length 0 or 1.

Like packs, indexes are written only once and then unmodified. A

GraphIndex builder is a mutable in-memory graph that can be sorted,

cross-referenced and written out when the write group completes.

There can also be index entries with a value of 'a' for absent. These

records exist just to be pointed to in a graph. This is used, for

example, to give the revision-parent pointer when the parent revision is

in a previous pack.

The data content for each record is a knit data chunk. The knits are

always unannotated - the annotations must be generated when needed.

(We'd like to cache/memoize the annotations.) The data hunks can be

moved between packs without needing to recompress them.

There can also be index entries with a value of 'a' for absent. These

records exist just to be pointed to in a graph. This is used, for

example, to give the revision-parent pointer when the parent revision is

in a previous pack.

It is not possible to regenerate an index from the body file, because it

contains information stored in the knit index that's not in the body.

(In particular, the per-file graph is only stored in the index.)

We would like to change this in a future format

We would like to change this in a future format.

The lock is a regular LockDir lock. The lock is only held for a much

reduced scope, while updating the pack-names file. The bulk of the

101

109

packs. (This should always be the same as the list of files in the

102

110

``packs/`` directory, but the file is needed for readonly http clients

103

111

that can't easily list directories, and it includes other information.)

104

The ``pack-names`` file can be recreated by looking at the packs on disk.

105

As well as the list of names, it also contains the size in bytes of th`d

112

The constraint on the ``pack-names`` list is that every file mentioned

113

must exist in the ``packs/`` directory.

114

115

In rare cases, when a writer is interrupted, about-to-be-removed packs

116

may still be present in the directory but removed from the list.

117

118

As well as the list of names, the pack-names file also contains the

119

size, in bytes, of each of the four indices. This is used to bootstrap

120

bisection search within the indices.

106

121

107

122

In normal use, one pack will be created for each commit to a repository.

108

123

This would build up to an inefficient number of files over time, so a

121

136

initially be created in a single-revision pack, then moved to a

122

137

ten-revision pack, then to a 100-pack, and so on.

123

138

124

125

126

127

Sketched notes on packs

128

~~~~~~~~~~~~~~~~~~~~~~~

129

130

fast pull:

131

132

- query and pull revs

133

- query and pull references

134

- query and pull further references

135

- etc

136

- for now:

137

- query and pull revs and sigs

138

- query and pull inventory with cache on

139

- query and pull all of texts

140

141

pack:

142

143

- two modes - complete and incremental

144

- complete:

145

146

- grab all revisions, regenerate deltas, etc.

147

- may want data grouping by recreation not by

148

origin. i.e. 'tip' pack has everything to construct

149

revision -1, older packs have data that is not needed

150

for current revisions.

151

152

- incremental:

153

154

- leave most data untouched most of the time

155

- therefore can't really promote old data to be current.

156

- possible approach:

157

158

upper number of packs = sum of digits in the commit count

159

so we need to combine packs whenever a commit rolls over a power

160

of 10 - at most once every 10 commit, possibly much less.

161

exponential backoff of commits per pack file. So the number of

162

revisions per pack is log10(revisions in repository).

163

up to 10 commits - 1 revision per pack

164

up to 100 commits - 2 revisions per pack

165

10 commits - 1 packs, 100 commits - 2 packs

166

167

- count number of 1-revision packs, if 10 group them

168

into one larger pack.

169

- count number of 10-unit packs. if 10, group those

170

into one larger pack.

171

- etc

172

173

this approach reads 10 entire packs every 10 commits,

174

100 every 100, 1000 every 1000

139

As with other repositories, in normal use data is only inserted.

140

However, in some circumstances we may want to garbage-collect or prune

141

existing data, or reconcile indexes.

175

142

176

143

vim: tw=72 ft=rest expandtab

Older »