1
1
Bazaar Windows Shell Extension Options
2
========================================
2
======================================
4
4
.. contents:: :local:
9
This document details the imlpementation strategy chosen for the
9
This document details the implementation strategy chosen for the
10
10
Bazaar Windows Shell Extensions, otherwise known as TortoiseBzr, or TBZR.
11
11
As justification for the strategy, it also describes the general architecture
12
12
of Windows Shell Extensions, then looks at the C++ implemented TortoiseSvn
13
13
and the Python implemented TortoiseBzr, and discusses alternative
14
14
implementation strategies, and the reasons they were not chosen.
16
The following points summarize the strategy.
16
The following points summarize the strategy:
18
18
* Main shell extension code will be implemented in C++, and be as thin as
19
19
possible. It will not directly do any VCS work, but instead will perform
78
81
conflict badly with other Python implemented applications (and will certainly
79
82
kill them in some situations). A similar issue exists with GUI toolkits used
80
83
- using (say) PyGTK directly in the shell extension would need to be avoided
81
(which it currently is best I can tell). It should also be obvious the shell
82
extension will be in many processes simultaneously, meaning use of a simple
83
log-file etc is problematic.
84
(which it currently is best I can tell). It should also be obvious that the
85
shell extension will be in many processes simultaneously, meaning use of a
86
simple log-file (for example) is problematic.
85
88
In practice, there is only 1 truly safe option - a low-level language (such
86
89
as C/C++) which makes use of only the win32 API, and a static version of the
87
C runtime library if necessary. Obviously, this sucks from our POV :)
90
C runtime library if necessary. Obviously, this sucks from our POV. :)
89
92
[1]: http://blogs.msdn.com/oldnewthing/archive/2006/12/18/1317290.aspx
91
94
Analysis of TortoiseSVN code
92
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94
97
TortoiseSVN is implemented in C++. It relies on an external process to
95
perform most UI (such as diff, log, commit etc commands), but it appears to
98
perform most UI (such as diff, log, commit etc.) commands, but it appears to
96
99
directly embed the SVN C libraries for the purposes of obtaining status for
97
100
icon overlays, context menu, drag&drop, etc.
99
102
The use of an external process to perform commands is fairly simplistic in
100
terms of parent and modal windows - for example, when selecting "Commit", a
103
terms of parent and modal windows. For example, when selecting "Commit", a
101
104
new process starts and *usually* ends up as the foreground window, but it may
102
105
occasionally be lost underneath the window which created it, and the user may
103
106
accidently start many processes when they only need 1. Best I can tell, this
108
111
directly needed by the shell are part of the "shell extension" and the rest
109
112
of TortoiseSvn is "just" a fairly large GUI application implementing many
110
113
commands. The command-line to the app has even been documented for people who
111
wish to automate tasks using that GUI. This GUI appears to also be
112
implemented in C++ using Windows resource files.
114
wish to automate tasks using that GUI. This GUI is also implemented in C++
115
using Windows resource files.
114
TortoiseSvn appears to cache using a separate process, aptly named
115
TSVNCache.exe. It uses a named pipe to accept connections from other
116
processes for various operations. At this stage, it's still unclear exactly
117
what is fetched from the cache and exactly what the shell extension fetches
118
directly via the subversion C libraries.
117
TortoiseSvn has an option (enabled by default) which enabled a cache using a
118
separate process, aptly named TSVNCache.exe. It uses a named pipe to accept
119
connections from other processes for various operations. When enabled, TSVN
120
fetches most (all?) status information from this process, but it also has the
121
option to talk directly to the VCS, along with options to disable functionality
120
124
There doesn't seem to be a good story for logging or debugging - which is
121
what you expect from C++ based apps :( Most of the heavy lifting is done by
125
what you expect from C++ based apps. :( Most of the heavy lifting is done by
122
126
the external application, which might offer better facilities.
124
128
Analysis of existing TortoiseBzr code
125
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
129
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
127
131
The existing code is actually quite cool given its history (SoC student,
128
132
etc), so this should not be taken as criticism of the implementer nor of the
129
implementation - indeed, many criticisms are also true of the TortoiseSvn
133
implementation. Indeed, many criticisms are also true of the TortoiseSvn
130
134
implementation - see above. However, I have attempted to list the bad things
131
135
rather than the good things so a clear future strategy can be agreed, with
132
136
all limitations understood.
158
162
and also error prone (it's possible the editor will check the file in,
159
163
meaning Windows explorer will be showing stale data). This may be possible to
160
164
address via file-system notifications, but a shared cache would be preferred
161
(although clearly more difficult to implement)
165
(although clearly more difficult to implement).
163
167
One tortoise port recently announced a technique for all tortoise ports to
164
168
share the same icon overlays to help work around a limitation in Windows on
165
the total number of overlays (its limited to 15, due to the number of bits
169
the total number of overlays (it's limited to 15, due to the number of bits
166
170
reserved in a 32bit int for overlays). TBZR needs to take advantage of that
167
171
(but to be fair, this overlay sharing technique was probably done after the
172
TBZR implementation).
170
174
The current code appears to recursively walk a tree to check if *any* file in
171
175
the tree has changed, so it can reflect this in the parent directory status.
172
176
This is almost certainly an evil thing to do (Shell Extensions are optimized
173
177
so that a folder doesn't even need to look in its direct children for another
174
178
folder, let alone recurse for any reason at all. It may be a network mounted
175
drive that doesn't perform at all)
179
drive that doesn't perform at all.)
177
181
Although somewhat dependent on bzr itself, we need a strategy for binary
178
182
releases (ie, it assumes python.exe, etc) and integration into an existing
298
302
external GUI apps themselves, etc) and see if a path forward does emerge for
299
303
Vista. We can re-evaluate this based on user feedback and more information
300
304
about features of the Vista property system.
309
The RPC mechanism and the tasks performed by the RPC server (rpc, file system
310
crawling and watching, device notifications, caching) are very similar to
311
those already implemented for TSVN and analysis of that code shows that
312
it is not particularly tied to any VCS model. As a result, consideration
313
should be given to making the best use of this existing debugged and
314
optimized technology.
316
Discussions with the TSVN developers have indicated that they would prefer us
317
to fork their code rather than introduce complexity and instability into
318
their code by attempting to share it. See the follow-ups to
319
http://thread.gmane.org/gmane.comp.version-control.subversion.tortoisesvn.devel/32635/focus=32651
322
For background, the TSVNCache process is fairly sophisticated - but mainly in
323
areas not related to source control. It has had various performance tweaks
324
and is smart in terms of minimizing its use of resources when possible. The
325
'cloc' utility counts ~5000 lines of C++ code and weighs in just under 200KB
326
on disk (not including headers), so this is not a trivial application.
327
However, the code that is of most interest (the crawlers, watchers and cache)
328
are roughly ~2500 lines of C++. Most of the source files only depend lightly
329
on SVN specifics, so it would not be a huge job to make the existing code
330
talk to Bazaar. The code is thread-safe, but not particularly thread-friendly
331
(ie, fairly coarse-grained locks are taken in most cases).
333
In practice, this give us 2 options - "fork" or "port":
335
* Fork the existing C++ code, replacing the existing source-control code with
336
code that talks to Bazaar. This would involve introducing a Python layer,
337
but only at the layers where we need to talk to bzrlib. The bulk of the
338
code would remain in C++.
340
This would have the following benefits:
342
- May offer significant performance advantages in some cases (eg, a
343
cache-hit would never enter Python at all.)
345
- Quickest time to a prototype working - the existing working code can be
348
And the following drawbacks:
350
- More complex to develop. People wishing to hack on it must be on Windows,
351
know C++ and own the most recent MSVC8.
353
- More complex to build and package: people making binaries must be on
354
Windows and have the most recent MSVC8.
356
- Is tied to Windows - it would be impractical for this to be
357
cross-platform, even just for test purposes (although parts of it
360
* Port the existing C++ code to Python. We would do this almost
361
"line-for-line", and attempt to keep many optimizations in place (or at
362
least document what the optimizations were for ones we consider dubious).
363
For the windows versions, pywin32 and ctypes would be leaned on - there
364
would be no C++ at all.
366
This would have the following benefits:
368
- Only need Python and Python skills to hack on it.
370
- No C++ compiler needed means easier to cut releases
372
- Python makes it easier to understand and maintain - it should appear much
373
less complex than the C++ version.
375
And the following drawbacks:
377
- Will be slower in some cases - eg, a cache-hit will involve executing
380
- Will take longer to get a minimal system working. In practice this
381
probably means the initial versions will not be as sophisticated.
383
Given the above, there are two issues which prevent Python being the clear
384
winner: (1) will it perform OK? (2) How much longer to a prototype?
386
My gut feeling on (1) is that it will perform fine, given a suitable Python
387
implementation. For example, Python code that simply looked up a dictionary
388
would be fast enough - so it all depends on how fast we can make our cache.
389
Re (2), it should be possible to have a "stub" process (did almost nothing in
390
terms of caching or crawling, but could be connected to by the shell) in a 8
391
hours, and some crawling and caching in 40. Note that this is separate from
392
the work included for the shell extension itself (the implementation of which
393
is largely independent of the TBZRCache implementation). So given the lack of
394
a deadline for any particular feature and the better long-term fit of using
395
Python, the conclusion is that we should "port" TSVN for bazaar.
397
Reuse of this code by Mercurial or other Python based VCS systems?
398
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
400
Incidentally, the hope is that this work can be picked up by the Mercurial
401
project (or anyone else who thinks it is of use). However, we will limit
402
ourselves to attempting to find a clean abstraction for the parts that talk
403
to the VCS (as good design would dictate regardless) and then try and assist
404
other projects in providing patches which work for both of us. In other
405
words, supporting multiple VCS systems is not an explicit goal at this stage,
406
but we would hope it is possible in the future.
305
411
The following is a high-level set of milestones for the implementation:
307
413
* Design the RPC mechanism used for icon overlays (ie, binary format used for
310
416
* Create Python prototype of the C++ "shim": modify the existing TBZR Python
311
417
code so that all references to "bzrlib" are removed. Implement the client
315
421
* Create initial implementation of RPC server in Python. This will use
316
422
bzrlib, but will also maintain a local cache to achieve the required
317
performance. The initial implementation may even be single-threaded, just
318
to keep synchronization issues to a minimum.
423
performance. File crawling and watching will not be implemented at this
424
stage, but caching will (although cache persistence might be skipped).
320
426
* Analyze performance of prototype. Verify that technique is feasible and
321
427
will offer reasonable performance and user experience.
429
* Implement file watching, crawling etc by "porting" TSVNCache code to
430
Python, as described above.
323
432
* Implement C++ shim: replace the Python prototype with a light-weight C++
324
version. We would work from the current TSVN sources, including its new
325
support for sharing icon overlays. Advice on if we should "fork" TSVN, or
326
try and manage our own svn based branch in bazaar are invited.
433
version. We will fork the current TSVN sources, including its new
434
support for sharing icon overlays (although advice on how to setup this
328
437
* Implement property pages and context menus in C++. Expand RPC server as