2481.1.5
by Robert Collins
Incremental push-pull performance anlysis draft. |
1 |
Analysing a specific use case |
2 |
----------------------------- |
|
3 |
||
4 |
The analysis of a use case needs to provide as outputs: |
|
5 |
* The functional requirements that the use case has to satisfy. |
|
6 |
* The file level operations and access patterns that will give the best |
|
7 |
performance. |
|
8 |
* A low friction API which will allow the use case to be implemented. |
|
9 |
* The release of bzr (and thus the supported features) for which the analysis |
|
10 |
was performed. The feature set of bzr defines the access patterns and data |
|
11 |
required to implement any use case. So when we add features, their design |
|
12 |
changes the requirements for the parts of the system they alter, so we need |
|
13 |
to re-analyse use cases when bzr's feature set changes. If future plans are |
|
14 |
considered in the analysis with the intention of avoiding rework, these |
|
15 |
should also be mentioned. |
|
16 |
||
17 |
Performing the analysis |
|
18 |
----------------------- |
|
19 |
||
20 |
The analysis needs to be able to define the characteristics of the |
|
21 |
involved disk storage and APIs. That means we need to examine the data |
|
22 |
required for the operation, in what order it is required, on both the |
|
23 |
read and write sides, and how that needs to be presented to be |
|
24 |
consistent with our layering. |
|
25 |
||
26 |
As a quick example: 'annotation of a file requires the file id looked up |
|
27 |
from the tree, the basis revision id from the tree, and then the text of |
|
28 |
that fileid-revisionid pair along with the creating revision id |
|
29 |
allocated to each line, and the dotted revision number of each of those |
|
30 |
revision ids.' All three of our key domain objects are involved here, |
|
31 |
but we haven't defined any characteristics of the api or disk facilities |
|
32 |
yet. We could then do that by saying something like 'the file-id lookup |
|
33 |
should degrade gracefully as trees become huge. The tree basis id should |
|
34 |
be constant time. Retrieval of the annotated text should be roughly |
|
35 |
constant for any text of the same size regardless of the number of |
|
36 |
revisions contributing to its content. Mapping of the revision ids to |
|
37 |
dotted revnos could be done as the text is retrieved, but its completely |
|
38 |
fine to post-process the annotated text to obtain dotted-revnos.' |
|
39 |
||
40 |
What factors should be considered? |
|
41 |
---------------------------------- |
|
42 |
||
43 |
Obviously, those that will make for an extremely fast system :). There |
|
44 |
are many possible factors, but the ones I think are most interesting to |
|
45 |
design with are: |
|
46 |
||
47 |
- baseline overhead: |
|
48 |
||
49 |
- The time to get bzr ready to begin the use case. |
|
50 |
||
51 |
- scaling: how does performance change when any of the follow aspects |
|
52 |
of the system are ratched massively up or down: |
|
53 |
||
54 |
- number of files/dirs/symlinks/subtrees in a tree (both working and |
|
55 |
revision trees) |
|
56 |
- size of any particular file |
|
57 |
- number of elements within a single directory |
|
58 |
- length of symlinks |
|
59 |
- number of changes to any file over time |
|
60 |
(subordinately also the number of merges of the file) |
|
61 |
- number of commits in the ancestry of a branch |
|
62 |
(subordinately also the number of merges) |
|
63 |
- number of revisions in a repository |
|
64 |
- number of fileids in a repository |
|
65 |
- number of ghosts in a given graph (revision or per-file) |
|
66 |
- number of branches in a repository |
|
67 |
- number of concurrent readers for a tree/branch/repository |
|
68 |
- number of concurrent writers for objects that support that. |
|
69 |
- latency to perform file operations (e.g. slow disks, network file systems, |
|
70 |
our VFS layer and FTP/SFTP/etc) |
|
71 |
- bandwidth to the disk storage |
|
72 |
- latency to perform semantic operations (hpss specific) |
|
73 |
- bandwidth when performing semantic operations. |
|
74 |
- locality of reference: If an operation requires data that is located |
|
75 |
within a small region at any point, we often get better performance |
|
76 |
than with an implementation of the same operation that requires the |
|
77 |
same amount of data but with a lower locality of reference. Its |
|
78 |
fairly tricky to add locality of reference after the fact, so I think |
|
79 |
its worth considering up front. |
|
80 |
||
81 |
Using these factors, to the annotate example we can add that its |
|
82 |
reasonable to do two 'semantic' round trips to the local tree, one to |
|
83 |
the branch object, and two to the repository. In file-operation level |
|
84 |
measurements, in an ideal world there would be no more than one round |
|
85 |
trip for each semantic operation. What there must not be is one round |
|
86 |
trip per revision involved in the revisionid->dotted number mapping, nor |
|
87 |
per each revision id attributed to a line in the text. |
|
88 |
||
89 |
Not all the items mentioned above are created equal. The analysis should |
|
90 |
include the parameters considered and the common case values for each - the |
|
91 |
optimisation should be around the common cases not around the exceptions. |
|
92 |
||
93 |
For instance, we have a smart server now; file level operations are relatively |
|
94 |
low latency and we should use that as the common case. At this point we intend |
|
95 |
to preserve the performance of the dumb protocol networking, but focus on |
|
96 |
improving network performance via the smart server and thus escape the |
|
97 |
file-level operation latency considerations. |
|
98 |
||
99 |
Many performance problems only become visible when changing the scaling knobs |
|
100 |
upwards to large trees. On small trees its our baseline performance that drives |
|
101 |
incremental improvements; on large trees its the amount of processing per item |
|
102 |
that drives performance. A significant goal therefore is to keep the amouont of |
|
103 |
data to be processed under control. Ideally we can scale in a sublinear fashion |
|
104 |
for all operations, but we MUST NOT scale even linearly for operations that |
|
105 |
invoke a latency multiplier. For example, reading a file on disk requires |
|
106 |
finding the inode for the file, then the block with the data and returning the |
|
107 |
contents. Due to directory grouping logic we pay a massive price to read files |
|
108 |
if we do not group the reads of files within the same directory. |