2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
1 |
Planned changes to the bzr core |
2506.1.3
by Alexander Belchenko
merge bzr.dev; fix ReST formatting in planned-performance-changes.txt |
2 |
=============================== |
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
3 |
|
4 |
Delivering the best possible performance requires changing the bzr core design |
|
5 |
from that present in 0.16. Some of these changes are incremental and can be |
|
6 |
done with no impact on disk format. Many of them however do require changes to |
|
7 |
the disk format, and these can be broken into two sets of changes, those which |
|
8 |
are sufficiently close to the model bzr uses today to interoperate with the |
|
9 |
0.16 disk formats, and those that are not able to interoperate with the 0.16 |
|
10 |
disk formats - specifically some planned changes may result in data which |
|
11 |
cannot be exported to bzr 0.16's disk formats and then imported back to the new |
|
12 |
format without losing critical information. If/when this takes place it will be |
|
13 |
essentially a migration for users to switch from their bzr 0.16 repository to a |
|
14 |
bzr that supports them. We plan to batch all such changes into one large |
|
15 |
'experimental' repository format, which will be complete stable and usable |
|
16 |
before we migrate it to become a supported format. Getting new versions of bzr |
|
17 |
in widespread use at that time will be very important, otherwise the user base |
|
18 |
may be split in two - users that have upgraded and users that have not. |
|
19 |
||
20 |
The following changes are grouped according to their compatability impact: |
|
21 |
library only, disk format but interoperable, disk format interoperability |
|
22 |
unknown, and disk format, not interoperable. |
|
23 |
||
24 |
Library changes |
|
2506.1.3
by Alexander Belchenko
merge bzr.dev; fix ReST formatting in planned-performance-changes.txt |
25 |
--------------- |
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
26 |
|
27 |
These changes will change bzrlib's API but will not affect the disk format and |
|
28 |
thus do not pose a significant migration issue. |
|
29 |
||
30 |
* For our 20 core use cases, we plan to add targeted API's to bzrlib that are |
|
31 |
repository-representation agnostic. These will instead reflect the shape of |
|
32 |
data access most optimal for that case. |
|
33 |
||
34 |
* Deprecate 'versioned files' as a library concept. Instead of asking for |
|
35 |
information about a file-over-time as a special case, we will move to an API |
|
36 |
that assumes less coupling between the historical information and the |
|
37 |
ability to obtain texts/deltas etc. Specifically, we need to remove all |
|
38 |
API's that act in terms of on disk representation except those within a |
|
39 |
given repository implementation. |
|
40 |
||
41 |
* Create a validator for revisions that is more amenable to use by other parts |
|
42 |
of the code base than just the gpg signing facility. This can be done today |
|
43 |
without changing disk, possibly with a performance hit until the disk |
|
44 |
formats match the validatory logic. It will be hard to tell if we have the |
|
45 |
right routine for that until all the disk changes are complete, so while |
|
46 |
this is a library only change, its likely one that will be delayed to near |
|
47 |
the end of the process. |
|
48 |
||
49 |
* Add an explicit API for managing cached annotations. While annotations are |
|
50 |
considered a cache this is not exposed in such a way that cache operations |
|
51 |
like 'drop the cache' can be performed. On current disk formats the cache is |
|
52 |
mandatory, but an API to manage would allow refreshing of the cache (e.g. |
|
53 |
after ghosts are filled in in baz conversions). |
|
54 |
||
55 |
* Use the _iter_changes API to perform merges. This is a small change that may |
|
56 |
remove the need to use inventories in merge, making a dramatic difference to |
|
2485.4.10
by Robert Collins
Review feedback. |
57 |
merge performance once the tree shape comparison optimisations are |
58 |
implemented. |
|
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
59 |
|
60 |
* Create a network-efficient revision graph API. This is the logic at the |
|
61 |
start of push and pull operations, which currently scales O(graph size). |
|
62 |
Fixing the scaling can be done, but there are tradeoffs to latency and |
|
63 |
performance to consider, making it a little tricky to get right. |
|
64 |
||
65 |
* Working tree disk operation ordering. We plan to change the order in which |
|
66 |
some operations are done (specifically TreeTransform ones) to improve |
|
67 |
performance. There is already a 66% performance boost in that area going |
|
68 |
through review. |
|
69 |
||
70 |
* Stop requiring full memory copies of files. Currently bzr requires that it |
|
71 |
can hold 3 copies of any file its versioning in memory. Solving this is |
|
72 |
tricky, particularly without performance regressions on small files, but |
|
73 |
without solving it versioning of .iso and other large objects will continue |
|
74 |
to be extremely painful. |
|
75 |
||
76 |
* Add an API for per-file graph access that alllows incremental access and is |
|
77 |
suitable for on-demand generation if desired. |
|
78 |
||
79 |
* Repository stacking API. Allowing multiple databases to be stacked to give a |
|
80 |
single 'repository' will allow implementation of some long desired features |
|
81 |
like history horizons, and bundle usage where the bundle is not added to the |
|
82 |
local repository just to examine its contents. |
|
83 |
||
84 |
* Revision data manipulation API. We need a single streaming API for adding |
|
85 |
data to or getting it from a repository. This will need to allow hints such |
|
86 |
as 'optimise for size', or 'optimise for fast-addition' to meet the various |
|
87 |
users planned, but it is a core part of the library today, and its not |
|
88 |
sufficiently clean to let us simplify/remove a lot of related code today. |
|
89 |
||
90 |
Interoperable disk changes |
|
2506.1.3
by Alexander Belchenko
merge bzr.dev; fix ReST formatting in planned-performance-changes.txt |
91 |
-------------------------- |
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
92 |
|
93 |
* New container format to allow single-file description of multiple named |
|
94 |
objects. This will provide the basis for transmission of revisions over the |
|
95 |
network, the new bundle format, and possibly a new repository format as |
|
2522.3.1
by Robert Collins
Draft proposed integration order for performance changes. |
96 |
well. [Core implemented] |
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
97 |
|
98 |
* Separate the annotation cache from the storage of actual file texts and make |
|
99 |
the annotation style, and when to do it, configurable. This will reduce data |
|
100 |
sent over the wire when repositories have had 'needs-annotations' turned |
|
101 |
off, which very large trees may choose to do - generating just-in-time |
|
102 |
annotations may be desirable for those trees (even when performing |
|
103 |
annotation based merges). |
|
104 |
||
105 |
* Repository disk operation ordering. The order that tasks access data within |
|
106 |
the repository and the layout of the data should be harmonised. This will |
|
107 |
require disk format changes but does not inherently alter the model, so its |
|
108 |
straight forward to export from a repository that has been optimised in this |
|
109 |
way to a 0.16 based repository. |
|
110 |
||
111 |
* Inventory representation. An inventory is a logical description of the shape |
|
112 |
of a version controlled tree. Currently we operate on the whole inventory as |
|
113 |
a tree broken down per directory, but we store it as a flat file. This scale |
|
114 |
very poorly as even a minor change between inventories requires us to scan |
|
115 |
the entire file, and in large trees this is many megabytes of data to |
|
116 |
consider. We are investigating the exact form, but the intent is to change |
|
117 |
the serialisation of inventories so that comparing two inventories can be |
|
118 |
done in some smaller time - e.g. O(log N) scaling. Whatever form this takes, |
|
119 |
a repository that can export it directly will be able to perform operations |
|
120 |
between two historical trees much more efficiently than the current |
|
121 |
repositories. |
|
122 |
||
123 |
* Delta storage optimisation. We plan to change the delta storage logic to use |
|
2485.4.10
by Robert Collins
Review feedback. |
124 |
a binary delta like xdelta rather than using line based deltas from python. |
125 |
These binary deltas could be done along ancestry ordering, or other |
|
126 |
arbitrary patterns chosen for their intended use. Line based deltas will |
|
127 |
still be created for cached annotations. This is still under some discussion. |
|
128 |
http://bazaar-vcs.org/PerformanceRoadmap/Xdelta |
|
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
129 |
|
130 |
* Greatest distance from origin cache. This is a possible change to introduce, |
|
131 |
but it may be unnecessary - listed here for completeness till it has been |
|
132 |
established as [un]needed. |
|
133 |
||
134 |
Possibly non-interoperable disk changes |
|
2506.1.3
by Alexander Belchenko
merge bzr.dev; fix ReST formatting in planned-performance-changes.txt |
135 |
--------------------------------------- |
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
136 |
|
137 |
* Removing of derivable data from the core of bzr. Much of the data that bzr |
|
138 |
stores is derivable from the users source files. For instance the |
|
139 |
annotations that record who introduced a line. Given the full history for a |
|
140 |
repository we can recreate that at any time. We want to remove the |
|
141 |
dependence of the core of bzr on any data that is derivable, because doing |
|
142 |
this will give us the freedom to: |
|
143 |
||
144 |
* Improve the derivation algorithm over time. |
|
145 |
* Deal with bugs in the derivation algorithms without having 'corrupt |
|
146 |
repositories' or such things. |
|
147 |
||
148 |
However, some of the data that is technically derived, like the per-file |
|
149 |
merge graph, is both considered core, and can be generated differently when |
|
150 |
certain circumstances arive, by bzr 0.16. Any change to the 'core' status of |
|
151 |
that data will discard data that cannot be recreated and thus lead to the |
|
152 |
inability to export from a format where that is derived data to bzr 0.16's |
|
153 |
formats without errors occuring in those circumstances. Some of the data |
|
154 |
that may be considered for this includes: |
|
155 |
||
156 |
* Per file merge graphs |
|
157 |
* Annotations |
|
158 |
||
159 |
Non-interoperable disk changes |
|
2506.1.3
by Alexander Belchenko
merge bzr.dev; fix ReST formatting in planned-performance-changes.txt |
160 |
------------------------------ |
2485.4.9
by Robert Collins
Create the top level changes-list from the London sprint for reference. |
161 |
|
162 |
* Drop the per-file merge graph 'cache' currently held in the FILE-ID.kndx |
|
163 |
files. A specific case of removing derivable data, this may allow smaller |
|
164 |
inventory metadata and also make it easier to allow two different trees (in |
|
165 |
terms of last-change made, e.g. if one is a working tree) to be compared |
|
166 |
using a hash-tree style approach. |
|
167 |
||
168 |
* Use hash based names for some objects in the bzr database. Because it would force |
|
169 |
total-knowledge-of-history on the graph revision objects will not be namable |
|
170 |
via hash's and neither will revisio signatures. Other than that though we |
|
171 |
can in principle use hash's e.g. SHA1 for everything else. There are many |
|
172 |
unanswered questions about hash based naming related to locality of |
|
173 |
reference impacts, which need to be answered before this becomes a definite |
|
174 |
item. |