~bzr-pqm/bzr/bzr.dev : contents of notes/performance.txt at revision 1185.50.74

~bzr-pqm/bzr/bzr.dev : (revision 1185.50.74)

For a tree holding 2.4.18 (two copies), 2.4.19, 2.4.20

With gzip -9:

mbp@hope% du .bzr
195110  .bzr/text-store
20      .bzr/revision-store
12355   .bzr/inventory-store
216325  .bzr
mbp@hope% du -s .
523128  .

Without gzip:

This is actually a pretty bad example because of deleting and
re-importing 2.4.18, but still not totally unreasonable.

----

linux-2.4.0: 116399 kB
after addding everything: 119505kB
bzr status  2.68s user 0.13s system 84% cpu 3.330 total
bzr commit 'import 2.4.0'  4.41s user 2.15s system 11% cpu 59.490 total

242446  .
122068  .bzr


----

Performance (2005-03-01)

To add all files from linux-2.4.18: about 70s, mostly inventory
serialization/deserialization.

To commit:
- finished, 6.520u/3.870s cpu, 33.940u/10.730s cum
-     134.040 elapsed

Interesting that it spends so long on external processing!  I wonder
if this is for running uuidgen?  Let's try generating things
internally.

Great, this cuts it to 17.15s user 0.61s system 83% cpu 21.365 total
to add, with no external command time.  The commit now seems to spend
most of its time copying to disk.

- finished, 6.550u/3.320s cpu, 35.050u/9.870s cum
-     89.650 elapsed

I wonder where the external time is now?  We were also using uuids()
for revisions.

Let's remove everything and re-add.  Detecting everything was removed
takes 
- finished, 2.460u/0.110s cpu, 0.000u/0.000s cum
-     3.430 elapsed

which may be mostly XML deserialization?

Just getting the previous revision takes about this long:

bzr invoked at Tue 2005-03-01 15:53:05.183741 EST +1100
  by mbp@sourcefrog.net on hope
  arguments: ['/home/mbp/bin/bzr', 'get-revision-inventory', 'mbp@sourcefrog.net-20050301044608-8513202ab179aff4-44e8cd52a41aa705']
  platform: Linux-2.6.10-4-686-i686-with-debian-3.1
- finished, 3.910u/0.390s cpu, 0.000u/0.000s cum
-     6.690 elapsed

Now committing the revision which removes all files should be fast.

- finished, 1.280u/0.030s cpu, 0.000u/0.000s cum
-     1.320 elapsed

Now re-add with new code that doesn't call uuidgen:

- finished, 1.990u/0.030s cpu, 0.000u/0.000s cum
-     2.040 elapsed

16.61s user 0.55s system 74% cpu 22.965 total

Status::

  - finished, 2.500u/0.110s cpu, 0.010u/0.000s cum
  -     3.350 elapsed

And commit::

Now patch up to 2.4.19.  There were some bugs in handling missing
directories, but with that fixed we do much better::

  bzr status  5.86s user 1.06s system 10% cpu 1:05.55 total

This is slow because it's diffing every file; we should use mtimes etc
to make this faster.  The cpu time is reasonable.

I see difflib is pure Python; it might be faster to shell out to GNU
diff when we need it.

Export is very fast::

  - finished, 4.220u/1.480s cpu, 0.010u/0.000s cum
  -     10.810 elapsed

  bzr export 1 ../linux-2.4.18.export1  3.92s user 1.72s system 21% cpu 26.030 total


Now to find and add the new changes::

  - finished, 2.190u/0.030s cpu, 0.000u/0.000s cum
  -     2.300 elapsed


:: 
  bzr commit 'import 2.4.19'  9.36s user 1.91s system 23% cpu 47.127 total

And the result is exactly right.  Try exporting::

  mbp@hope%  bzr export 4 ../linux-2.4.19.export4
  bzr export 4 ../linux-2.4.19.export4  4.21s user 1.70s system 18% cpu 32.304 total

and the export is exactly the same as the tarball.

Now we can optimize the diff a bit more by not comparing files that
have the right SHA-1 from within the commit

For comparison::

  patch -p1 < ../kernel.pkg/patch-2.4.20  1.61s user 1.03s system 13% cpu 19.106 total


Now status after applying the .20 patch.  With full-text verification::

  bzr status  7.07s user 1.32s system 13% cpu 1:04.29 total

with that turned off::

  bzr status  5.86s user 0.56s system 25% cpu 25.577 total

After adding:

  bzr status  6.14s user 0.61s system 25% cpu 26.583 total

Should add some kind of profile counter for quick compares vs slow
compares.

  bzr commit 'import 2.4.20'  7.57s user 1.36s system 20% cpu 43.568
  total

export:  finished, 3.940u/1.820s cpu, 0.000u/0.000s cum,  50.990 elapsed

also exports correctly

now .21

bzr commit 'import 2.4.1'  5.59s user 0.51s system 60% cpu 10.122 total

265520  .
137704  .bzr

import 2.4.2
317758  .
183463  .bzr


with everything through to 2.4.29 imported, the .bzr directory is
1132MB, compared to 185MB for one tree.  The .bzr.log is 100MB!.  So
the storage is 6.1 times larger, although we're holding 30 versions.
It's pretty large but I think not ridiculous.  By contrast the tarball
for 2.4.0 is 104MB, and the tarball plus uncompressed patches are
315MB.

Uncompressed, the text store is 1041MB.  So it is only three times
worse than patches, and could be compressed at presumably roughly
equal efficiency.  It is large, but also a very simple design and
perhaps adequate for the moment.  The text store with each file
individually gziped is 264MB, which is also a very simple format and
makes it less than twice the size of the source tree.

This is actually rather pessimistic because I think there are some
orphaned texts in there.

Measured by du, the compressed full-text store is 363MB; also probably
tolerable.

The real fix is perhaps to use some kind of weave, not so much for
storage efficiency as for fast annotation and therefore possible
annotation-based merge.

-----


2005-03-25

Now we have recursive add, add is much faster.  Adding all of the
linux 2.4.19 kernel tree takes only 

finished, 5.460u/0.610s cpu, 0.010u/0.000s cum, 6.710 elapsed


However, the store code currently flushes to disk after every write,
which is probably excessive.  So a commit takes

finished, 8.740u/3.950s cpu, 0.010u/0.000s cum, 156.420 elapsed


Status is now also quite fast, depsite that it still has to read all
the working copies:

mbp@hope% bzr status                                                            ~/work/linux-2.4.19
bzr status  5.51s user 0.79s system 99% cpu 6.337 total

strace shows much of this is in write(2), probably because of
logging.  With more buffering on that file, removing all the explicit
flushes, that is reduced to 

mbp@hope% time bzr status  
bzr status  5.23s user 0.42s system 97% cpu 5.780 total

which is mostly opening, stating and reading files, as it should be.
Still a few too many stat calls.

Now fixed up handling of root directory.

Without flushing everything to disk as it goes into the store:

mbp@hope% bzr commit -m 'import linux 2.4.19'
bzr commit -m 'import linux 2.4.19'  8.15s user 2.09s system 53% cpu 19.295 total

mbp@hope% time bzr diff
bzr diff  5.80s user 0.52s system 69% cpu 9.128 total
mbp@hope% time bzr status
bzr status  5.64s user 0.43s system 68% cpu 8.848 total

patch -p1 < ../linux.pkg/patch-2.4.20  1.67s user 0.96s system 90% cpu 2.905 total

The diff changes 3462 files according to diffstat.

branch format: Bazaar-NG branch, format 0.0.4

in the working tree:
   8674 unchanged
   2463 modified
    818 added
    229 removed
      0 renamed
      0 unknown
      4 ignored
    614 versioned subdirectories

That is, 3510 entries have changed, but there are 48 changed
directories so the count is exactly right!

bzr commit -v -m 'import 2.4.20'  8.23s user 1.09s system 48% cpu 19.411 total

Kind of strange that this takes as much time as committing the whole
thing; I suppose it has to read every file.  

This shows many files as being renamed; I don't know why that would
be.


Patch to 2.4.21:

 2969 files changed, 366643 insertions(+), 147759 deletions(-)

After auto-add:

 2969 files changed, 372168 insertions(+), 153284 deletions(-)

I wonder why it is not exactly the same?  Maybe because the python
diff algorithm is a bit differnt to GNU diff.

----

2005-03-29  

full check, retrieving all file texts once for the 2.4 kernel branch
takes 10m elapsed, 1m cpu time.  lots of random IO and seeking.

----


mbp@hope% time python =bzr deleted --show-ids 
README                                             README-fa1d8447b4fd0140-adbf4342752f0fc3
python =bzr deleted --show-ids  1.55s user 0.09s system 96% cpu 1.701 total
mbp@hope% time python -O =bzr deleted --show-ids 
README                                             README-fa1d8447b4fd0140-adbf4342752f0fc3
python -O =bzr deleted --show-ids  1.47s user 0.10s system 101% cpu 1.547 total
mbp@hope% time python -O =bzr deleted --show-ids
README                                             README-fa1d8447b4fd0140-adbf4342752f0fc3
python -O =bzr deleted --show-ids  1.49s user 0.07s system 99% cpu 1.565 total
mbp@hope% time python =bzr deleted --show-ids   
README                                             README-fa1d8447b4fd0140-adbf4342752f0fc3
python =bzr deleted --show-ids  1.55s user 0.08s system 99% cpu 1.637 total

small but significant improvement from Python -O

----

Loading a large inventory through cElementTree is pretty quick; only
about 0.117s.  By contrast reading the inventory into our data
structure takes about 0.7s.  

So I think the problem must be in converting everything to
InventoryEntries and back again every time.

Thought about that way it seems pretty inefficient: why create all
those objects when most of them aren't called on most invocations?
Instead perhaps the Inventory object should hold the ElementTree and
pull things out of it only as necessary?  We can even have an index
pointing into the ElementTree by id, path, etc.


as of r148

bzr deleted  1.46s user 0.08s system 98% cpu 1.561 total


Alternatively maybe keep an id2path and path2id cache?  Keeping it
coherent may be hard...

1185.1.29 by Robert Collins merge merge tweaks from aaron, which includes latest .dev	1	For a tree holding 2.4.18 (two copies), 2.4.19, 2.4.20
	2
	3	With gzip -9:
	4
	5	mbp@hope% du .bzr
	6	195110 .bzr/text-store
	7	20 .bzr/revision-store
	8	12355 .bzr/inventory-store
	9	216325 .bzr
	10	mbp@hope% du -s .
	11	523128 .
	12
	13	Without gzip:
	14
	15	This is actually a pretty bad example because of deleting and
	16	re-importing 2.4.18, but still not totally unreasonable.
	17
	18	----
	19
	20	linux-2.4.0: 116399 kB
	21	after addding everything: 119505kB
	22	bzr status 2.68s user 0.13s system 84% cpu 3.330 total
	23	bzr commit 'import 2.4.0' 4.41s user 2.15s system 11% cpu 59.490 total
	24
	25	242446 .
	26	122068 .bzr
	27
	28
	29	----
	30
	31	Performance (2005-03-01)
	32
	33	To add all files from linux-2.4.18: about 70s, mostly inventory
	34	serialization/deserialization.
	35
	36	To commit:
	37	- finished, 6.520u/3.870s cpu, 33.940u/10.730s cum
	38	- 134.040 elapsed
	39
	40	Interesting that it spends so long on external processing! I wonder
	41	if this is for running uuidgen? Let's try generating things
	42	internally.
	43
	44	Great, this cuts it to 17.15s user 0.61s system 83% cpu 21.365 total
	45	to add, with no external command time. The commit now seems to spend
	46	most of its time copying to disk.
	47
	48	- finished, 6.550u/3.320s cpu, 35.050u/9.870s cum
	49	- 89.650 elapsed
	50
	51	I wonder where the external time is now? We were also using uuids()
	52	for revisions.
	53
	54	Let's remove everything and re-add. Detecting everything was removed
	55	takes
	56	- finished, 2.460u/0.110s cpu, 0.000u/0.000s cum
	57	- 3.430 elapsed
	58
	59	which may be mostly XML deserialization?
	60
	61	Just getting the previous revision takes about this long:
	62
	63	bzr invoked at Tue 2005-03-01 15:53:05.183741 EST +1100
	64	by mbp@sourcefrog.net on hope
65	arguments: ['/home/mbp/bin/bzr', 'get-revision-inventory', 'mbp@sourcefrog.net-20050301044608-8513202ab179aff4-44e8cd52a41aa705']
66	platform: Linux-2.6.10-4-686-i686-with-debian-3.1
67	- finished, 3.910u/0.390s cpu, 0.000u/0.000s cum
68	- 6.690 elapsed
69
70	Now committing the revision which removes all files should be fast.
71
72	- finished, 1.280u/0.030s cpu, 0.000u/0.000s cum
73	- 1.320 elapsed
74
75	Now re-add with new code that doesn't call uuidgen:
76
77	- finished, 1.990u/0.030s cpu, 0.000u/0.000s cum
78	- 2.040 elapsed
79
80	16.61s user 0.55s system 74% cpu 22.965 total
81
82	Status::
83
84	- finished, 2.500u/0.110s cpu, 0.010u/0.000s cum
85	- 3.350 elapsed
86
87	And commit::
88
89	Now patch up to 2.4.19. There were some bugs in handling missing
90	directories, but with that fixed we do much better::
91
92	bzr status 5.86s user 1.06s system 10% cpu 1:05.55 total
93
94	This is slow because it's diffing every file; we should use mtimes etc
95	to make this faster. The cpu time is reasonable.
96
97	I see difflib is pure Python; it might be faster to shell out to GNU
98	diff when we need it.
99
100	Export is very fast::
101
102	- finished, 4.220u/1.480s cpu, 0.010u/0.000s cum
103	- 10.810 elapsed
104
105	bzr export 1 ../linux-2.4.18.export1 3.92s user 1.72s system 21% cpu 26.030 total
106
107
108	Now to find and add the new changes::
109
110	- finished, 2.190u/0.030s cpu, 0.000u/0.000s cum
111	- 2.300 elapsed
112
113
114	::
115	bzr commit 'import 2.4.19' 9.36s user 1.91s system 23% cpu 47.127 total
116
117	And the result is exactly right. Try exporting::
118
119	mbp@hope% bzr export 4 ../linux-2.4.19.export4
120	bzr export 4 ../linux-2.4.19.export4 4.21s user 1.70s system 18% cpu 32.304 total
121
122	and the export is exactly the same as the tarball.
123
124	Now we can optimize the diff a bit more by not comparing files that
125	have the right SHA-1 from within the commit
126
127	For comparison::
128
129	patch -p1 < ../kernel.pkg/patch-2.4.20 1.61s user 1.03s system 13% cpu 19.106 total
130
131
132	Now status after applying the .20 patch. With full-text verification::
133
134	bzr status 7.07s user 1.32s system 13% cpu 1:04.29 total
135
136	with that turned off::
137
138	bzr status 5.86s user 0.56s system 25% cpu 25.577 total
139
140	After adding:
141
142	bzr status 6.14s user 0.61s system 25% cpu 26.583 total
143
144	Should add some kind of profile counter for quick compares vs slow
145	compares.
146
147	bzr commit 'import 2.4.20' 7.57s user 1.36s system 20% cpu 43.568
148	total
149
150	export: finished, 3.940u/1.820s cpu, 0.000u/0.000s cum, 50.990 elapsed
151
152	also exports correctly
153
154	now .21
155
156	bzr commit 'import 2.4.1' 5.59s user 0.51s system 60% cpu 10.122 total
157
158	265520 .
159	137704 .bzr
160
161	import 2.4.2
162	317758 .
163	183463 .bzr
164
165
166	with everything through to 2.4.29 imported, the .bzr directory is
167	1132MB, compared to 185MB for one tree. The .bzr.log is 100MB!. So
168	the storage is 6.1 times larger, although we're holding 30 versions.
169	It's pretty large but I think not ridiculous. By contrast the tarball
170	for 2.4.0 is 104MB, and the tarball plus uncompressed patches are
171	315MB.
172
173	Uncompressed, the text store is 1041MB. So it is only three times
174	worse than patches, and could be compressed at presumably roughly
175	equal efficiency. It is large, but also a very simple design and
176	perhaps adequate for the moment. The text store with each file
177	individually gziped is 264MB, which is also a very simple format and
178	makes it less than twice the size of the source tree.
179
180	This is actually rather pessimistic because I think there are some
181	orphaned texts in there.
182
183	Measured by du, the compressed full-text store is 363MB; also probably
184	tolerable.
185
186	The real fix is perhaps to use some kind of weave, not so much for
187	storage efficiency as for fast annotation and therefore possible
188	annotation-based merge.
189
190	-----
191
192
193	2005-03-25
194
195	Now we have recursive add, add is much faster. Adding all of the
196	linux 2.4.19 kernel tree takes only
197
198	finished, 5.460u/0.610s cpu, 0.010u/0.000s cum, 6.710 elapsed
199
200
201	However, the store code currently flushes to disk after every write,
202	which is probably excessive. So a commit takes
203
204	finished, 8.740u/3.950s cpu, 0.010u/0.000s cum, 156.420 elapsed
205
206
207	Status is now also quite fast, depsite that it still has to read all
208	the working copies:
209
210	mbp@hope% bzr status ~/work/linux-2.4.19
211	bzr status 5.51s user 0.79s system 99% cpu 6.337 total
212
213	strace shows much of this is in write(2), probably because of
214	logging. With more buffering on that file, removing all the explicit
215	flushes, that is reduced to
216
217	mbp@hope% time bzr status
218	bzr status 5.23s user 0.42s system 97% cpu 5.780 total
219
220	which is mostly opening, stating and reading files, as it should be.
221	Still a few too many stat calls.
222
223	Now fixed up handling of root directory.
224
225	Without flushing everything to disk as it goes into the store:
226
227	mbp@hope% bzr commit -m 'import linux 2.4.19'
228	bzr commit -m 'import linux 2.4.19' 8.15s user 2.09s system 53% cpu 19.295 total
229
230	mbp@hope% time bzr diff
231	bzr diff 5.80s user 0.52s system 69% cpu 9.128 total
232	mbp@hope% time bzr status
233	bzr status 5.64s user 0.43s system 68% cpu 8.848 total
234
235	patch -p1 < ../linux.pkg/patch-2.4.20 1.67s user 0.96s system 90% cpu 2.905 total
236
237	The diff changes 3462 files according to diffstat.
238
239	branch format: Bazaar-NG branch, format 0.0.4
240
241	in the working tree:
242	8674 unchanged
243	2463 modified
244	818 added
245	229 removed
246	0 renamed
247	0 unknown
248	4 ignored
249	614 versioned subdirectories
250
251	That is, 3510 entries have changed, but there are 48 changed
252	directories so the count is exactly right!
253
254	bzr commit -v -m 'import 2.4.20' 8.23s user 1.09s system 48% cpu 19.411 total
255
256	Kind of strange that this takes as much time as committing the whole
257	thing; I suppose it has to read every file.
258
259	This shows many files as being renamed; I don't know why that would
260	be.
261
262
263	Patch to 2.4.21:
264
265	2969 files changed, 366643 insertions(+), 147759 deletions(-)
266
267	After auto-add:
268
269	2969 files changed, 372168 insertions(+), 153284 deletions(-)
270
271	I wonder why it is not exactly the same? Maybe because the python
272	diff algorithm is a bit differnt to GNU diff.
273
274	----
275
276	2005-03-29
277
278	full check, retrieving all file texts once for the 2.4 kernel branch
279	takes 10m elapsed, 1m cpu time. lots of random IO and seeking.
280
281	----
282
283
284	mbp@hope% time python =bzr deleted --show-ids
285	README README-fa1d8447b4fd0140-adbf4342752f0fc3
286	python =bzr deleted --show-ids 1.55s user 0.09s system 96% cpu 1.701 total
287	mbp@hope% time python -O =bzr deleted --show-ids
288	README README-fa1d8447b4fd0140-adbf4342752f0fc3
289	python -O =bzr deleted --show-ids 1.47s user 0.10s system 101% cpu 1.547 total
290	mbp@hope% time python -O =bzr deleted --show-ids
291	README README-fa1d8447b4fd0140-adbf4342752f0fc3
292	python -O =bzr deleted --show-ids 1.49s user 0.07s system 99% cpu 1.565 total
293	mbp@hope% time python =bzr deleted --show-ids
294	README README-fa1d8447b4fd0140-adbf4342752f0fc3
295	python =bzr deleted --show-ids 1.55s user 0.08s system 99% cpu 1.637 total
296
297	small but significant improvement from Python -O
298
299	----
300
301	Loading a large inventory through cElementTree is pretty quick; only
302	about 0.117s. By contrast reading the inventory into our data
303	structure takes about 0.7s.
304
305	So I think the problem must be in converting everything to
306	InventoryEntries and back again every time.
307
308	Thought about that way it seems pretty inefficient: why create all
309	those objects when most of them aren't called on most invocations?
310	Instead perhaps the Inventory object should hold the ElementTree and
311	pull things out of it only as necessary? We can even have an index
312	pointing into the ElementTree by id, path, etc.
313
314
315	as of r148
316
317	bzr deleted 1.46s user 0.08s system 98% cpu 1.561 total
318
319
320	Alternatively maybe keep an id2path and path2id cache? Keeping it
321	coherent may be hard...