1185.1.29
by Robert Collins
merge merge tweaks from aaron, which includes latest .dev |
1 |
Patch pools |
2 |
*********** |
|
3 |
||
4 |
*Patch pools* are an optimization for efficient storage of related |
|
5 |
branches. They are not required for the first release. |
|
6 |
||
7 |
Revisions, inventories, and file states are identified in Bazaar-NG by |
|
8 |
universally unique hashes, and they are never modified once they are |
|
9 |
created. Objects which are common between branches may therefore be |
|
10 |
stored only once and referenced from each branch. Various strategies |
|
11 |
are available for doing this. |
|
12 |
||
13 |
Objects can be held hard-linked between related branches on |
|
14 |
filesystems that support hardlinks. This provides automatic reference |
|
15 |
counting as branches are deleted. |
|
16 |
||
17 |
It is common to have several branches of the same project on a |
|
18 |
machine, with many objects in common. These can be configured with |
|
19 |
each other on the pool path. The parent should be the default pool |
|
20 |
path when creating a new branch. |
|
21 |
||
22 |
Each user might also have a pool that acts as a cache of all remote |
|
23 |
revisions. Such a cache might use some kind of least-recently-used |
|
24 |
policy to limit its size. |
|
25 |
||
26 |
The user might nominate a series or hierarchy of pools to be searched for a |
|
27 |
patch; these might be progressively on the local machine, local |
|
28 |
network and remotely. |
|
29 |
||
30 |
A system like the supermirror might make good use of a pool that |
|
31 |
gradually accumulates all public objects in the world, and stores |
|
32 |
branches very cheaply. |
|
33 |
||
34 |
One complication is garbage collection. Naive implementations that |
|
35 |
store references from branches into the pool will not be able to |
|
36 |
detect objects that are no longer referenced by any active branch; as |
|
37 |
branches are created and deleted over time such objects will |
|
38 |
accumulate. This may not be a problem in many cases, given the |
|
39 |
relative abundance of disk compared to programmer time, and the |
|
40 |
relatively small number of long branches that are discarded. There |
|
41 |
are some partial solutions: |
|
42 |
||
43 |
* Keep a reference count for each object. There is no problem of |
|
44 |
circular references. However, keeping the count accurately requires |
|
45 |
that branches are never lost or deleted other than through the |
|
46 |
correct mechanism. |
|
47 |
||
48 |
* From time to time, building a new pool including only objects from |
|
49 |
active branches. |
|
50 |
||
51 |
* Keeping a pool that holds only patches known to be available from |
|
52 |
elsewhere, so the pool is only a cache and never the single source |
|
53 |
of a particular object. Such a pool can then be discarded at will |
|
54 |
and the objects will be re-fetched from their original source. |
|
55 |
||
56 |
This last point suggests that new objects should never be written |
|
57 |
solely into a pool, because of the risk that they might be |
|
58 |
accidentally lost. |
|
59 |
||
60 |
Using the parent on the default pool path allows for varying |
|
61 |
greed or laziness in fetching objects. By default, objects might be |
|
62 |
read only as necessary, and then stored in the local cache. If the |
|
63 |
user wants to keep the whole history available locally that could be |
|
64 |
specified with a ``--greedy`` option when making the branch, or |
|
65 |
through later pulling down the history. |