1
*****************************************
2
Opportunities for improvement on GNU Arch
3
*****************************************
6
Bazaar-NG is based on the GNU Arch system, and inherits a lot of its
7
design from Arch. However, there are several things we will change in
8
Baz to (we hope) improve the user experience.
10
The core design of Arch is good, brilliant even. It can scale from
11
small projects too large ones, and is a good foundation for building
12
tools on top. However, the design is far too complex, both in
13
concepts and execution. So the plan is to cut out as many things as
14
we can, add a few other good concepts from other systems, and try to
15
make it into a whole that is consistent and understandable.
23
No other system is able to express this valuable idea: "I merged all
24
these changes from other people; here is the result."
26
However, it should *also* be possible to bring in perfect-fit
27
patches without creating a new commit.
31
Find a common ancestor on diverged and cross-merged branches.
33
* Apply isolated changesets.
35
We should extend this by having a good way to send changesets by
36
email, preferably readable even by people who are not using Arch.
38
* GPG signing of commits.
40
Open source hackers almost all have GPG keys already, and GPG deals
41
with a lot of PKI functions to do with propagating, signing and
44
Signed commits are interesting in many ways, not least of which in
45
detecting intrusion to code servers.
47
* Anonymous downloads can be done without an active server.
49
Good for security; also very good for people who do not have a
50
permnanently-connected machine on which they can install their own
51
software, or which is very tightly secured.
53
It's neat that you can upload over only sftp/ftp, but I'm not sure
54
it's really worth the hassle; getting properly atomic operations
55
over remote-file protocols is hard.
57
* Clean and transparent storage format.
59
This is a neat hack, and gives people assurance that they can get
60
their data back out again even if the tool disappears. Very nice.
61
(Bazaar-NG won't keep the exact same format, but the ideas will be
64
* Relatively easily parseable/scriptable shell interface. Good for
65
people writing web/emacs/editor/IDE interfaces, or scripts based it.
67
* Automatically build (and hardlink) revision libraries, with
70
I don't know how many people want *every* revision in a library, but
71
it can be handy to have a few key ones.
73
In general making use of hardlinks when they are available and safe
76
* Rely on ssh for remote access, authentication, and confidentiality.
78
* Patch headers separate from patch bodies. (Sometimes you only want
81
* Autogeneration of Changelogs -- but should be in GNU format, at
82
least optionally. I'm not convinced auto-updating them in the tree
83
is worthwhile; it makes merges wierd.
87
It seems useful to prevent accidental commits to things that are
88
meant to be stable. However, the set-once nature of sealing is
89
undesirable, because people can make mistakes or want to seal more
92
One possibility is to have a voluntary write-protect flag set on
93
branches that should not normally be updated. One can remove the
94
flag if it turns out it was set wrongly.
96
* ``resolved`` command in Bazaar-1.1
98
Good for preventing accidental breakage.
100
* Multi-level undo -- though could perhaps be more understandable,
101
perhaps through ``undo-history``.
107
One lesson from usability design is that it does not always work to
108
have a complex model and then try to hide complexity in the user
109
interface. If you want something to be a joy to use, that must be
110
designed in from the bottom up.
112
(Some developers may react to tla by thinking "eww, how gross" on
113
particular points. As much as possible we might like to fix these.)
115
* General impression that the tool is telling you how to run your life.
117
* Non-standard terminology
119
Arch uses terms like "version" and "category" in ways that are
120
confusing to people accustomed to other version control systems.
123
Therefore: development proceeds on a *branch*, which is a series of
124
*revisions*. Simple and obvious.
128
* Command-line options are wierdly inconsistent with both other
129
systems, with each others, and with what people would like to do.
130
For example, I would think the obvious usage is ``bzr diff [FILE]``,
131
but ``tla diff`` does not let you specify a file at all.
133
Most commands should take filenames as their argument: log, diff,
136
* Despite having too many commands, there are massive and glaring
137
gaps, such reverting a single file or a tree.
139
* Commands are too different from what people are used to in CVS, and
140
often not for a good reason.
142
* Identifiers are too long. In part this is because Arch tries to
143
have identifiers which are both human-assigned and universally unique.
145
* Archive names are probably unnecessary.
147
* Part of the reason for complexity in archives is that the Arch
148
design wants to be able to go and find patches on other branches at
149
a later time. (This is not really implemented or used at the
152
I think the complexity is unjustified: changesets and revisions have
153
universally unique names so they can simply be archived, either on
154
the machine of the person who wants them or on a central site like
157
* The tool is *unforgiving*; if people create a branch with the wrong
158
name it will be around forever.
160
* Branches are heaviweight; a record always persists in the archive.
161
Sometimes it is good to create micro-branches, try something out,
162
and then discard them. If nobody wants the changes, there is no
163
reason for the tool to keep them.
165
* Working offline requires creating a new branch and merging back and
166
forth. This is both more work than it should be, and also polutes
167
the "story" told by branching.
169
As much as possible, the *accidental* difference of the location of
170
the repository should not effect the *semantics* of branches.
172
(However, some merging may obviously be necessary when there is
175
* Archive registration. This causes confusion and is unnecessary.
177
Proposed solutions such as archive aliases or an additional command
178
to register-and-get make it worse.
180
* Wierd file names (``++`` and ``,,``, which persist in user
181
directories and cause breakage of many tools. Gives a bad
182
impression, and it's even worse when people have to interact with
185
* Overly-long identifiers. (One advantage of pointing to branches
186
using filenames or URLs is that the length of the path depends on
187
how close it is to the users location, and they can more easily use
189
* Too slow by default.
191
Arch can be made fast, but in the hands of a nonexpert user it is
192
often slow. For most users, disk is cheaper than CPU time, which is
193
cheaper than network roundtrips. The performance model should be
194
transparent -- users should not be surprised that something is slow.
196
* Tagging onto branches.
198
Unifying tags and commits is interesting, but the result is hard to
199
mentally model; even Arch maintainers can't say exactly how it is
200
supposed to work in some cases.
202
* Reinventing the world from scratch in libhackerlab/frob/pika/xl.
204
Those are all fine projects and may be useful in the future, but
205
they are totally unnecessary to write a great version control
206
system. It is not an enormous project; it is not CPU-cycle
207
critical; something like Python will be fine.
209
* Lack (for the moment) of an active server.
211
Given that network traffic is the most expensive thing, we can
212
possibly get a better solution by having intelligence on both sides
213
of the link. Suppose we want to get just one file from a previous
216
* Poor Windows/Mac support.
218
Even though many developers only work on Linux, this still holds a
219
tool back. The reason is this: at least some projects have some
220
developers on Windows some of the time. Those projects can't switch
221
to Arch. Most people want to only learn one tool deeply, so it
224
Don't make any overly Unixy assumptions. Avoid too-cute filesystem
227
Being in Python should help with portability: people do need to
228
install it, but many developers will already have it and the total
229
burden is possibly less than that of installing C requisite
232
* Quirky filename support.
234
Files with non-ascii names, or names containing whitespace tend to
235
be handled poorly, perhaps partly because of arch's shell heritage.
237
By swallowing XML we do at least get automatic quoting of wierd
238
strings, and we will always use UTF-8 for internal storage.
240
* Complex file-id-tagging
242
Nobody should be expected to understand this. There are two basic
243
cases: people want to auto-add everything, and want to add by hand.
244
Both can be reasonably accomodated in a simpler system.
246
* Complex naming-convention regexps in ``.arch-inventory`` and
247
``{arch}/id-tagging-method``. (The fact that there are two
248
overlapping mechanisms with very different names is also bad.)
250
All this complexity basically just comes down to versioned, ignored,
251
unknown, the same as in every other system. So we might as well
254
There are relatively few cases where regexps help more than globs,
255
and people do find them more complex. Even experienced users can
256
forget to escape ``\.``. We can have a bit of flexibility with
257
(say) zsh-style extended globs like ``*.(pyo|pyc)``.
259
* Some files inside ``{arch}`` are meant to be edited by the user, and
260
some are not. This is a flaw common to other systems, including
261
Bitkeeper. The user should be clear on whether they should touch
262
things in a directory or not.
264
* Source-librarian function works poorly.
266
It is not the place of a tool to force people to stay organized; it
267
should just facilitate it. In any case, a library without
268
descriptive text is of little use. So bazaar-ng does not force
269
three-level naming but rather lets people arrange their own trees,
270
and put on their own descriptions (either within the tree, or by
271
e.g. having a wiki page listing branches, descriptions and URLs.)
273
* Whining about inode mismatches on pristines/revlibs.
275
It's fine that there is validation, but the tool should not show off
276
its limitations. Just do the right thing.
278
* More generally, not quite enough consistency/safety checking.
280
* Unclear what commands work on subdirs and what works on the whole
283
* Hard to share work on a single branch -- though still not really too
286
* Lack of partial commits of added/deleted files.
288
* Separate id tags for each file; simple implementation but probably
289
costs too much disk space.
291
* Way too many deeply-nested directories; should be just one.
293
* ``.listing`` files are ugly and a point of failure. They can cause
294
trouble on some servers which limit access to dot files.
296
Isn't it possible to have the top-level file be predictable and find
297
everything else needed from there?
299
* Summary separate from log message.
301
Simpler to just have one message, and let people extract the first
302
line/sentence if they wish.
304
Rather than 'keywords', let arbitrary properties be attached to the
305
revision at the time of commit.
309
Simpler disconnected operation
310
------------------------------
312
A basic distributed VCS operation is to make it easy to work on an
313
offline laptop. Arch can do this in a few ways, but none of them are
316
http://wiki.gnuarch.org/moin.cgi/mini_5fTravellingOftenWithArch
318
Yaron Minsky writes (2005-01-18):
320
I was wondering what people considered to be a good setup for using
321
Arch on a laptop. Here's the basic situation. I have a few projects
322
that reside in arch repositories on my desktop computer. Basically,
323
I'd like to be able to do commits from my laptop, and have those
324
commits eventually migrate up to the main repository. I understand
325
that the right way of doing this is to set up archives on the laptop.
326
But what's the cleanest way of doing this? And is there some way of
327
making the commits I do on the laptop show up cleanly and individually
328
on the desktop once they are merged in?
334
baz default is much less strict.
336
Much of tla depends on being able to categorize files. Some hangovers
337
from larch -- eg precious and backup are essentially the same. junk
338
is never deleted today.
340
Automatic version control with 'untagged-source source'. But this is
346
- having the feature at all
347
- complex way to define it
349
Default of 166 lines.
351
Remove id-tagging-method command or at most make it read-only. If
352
people really want to use deprecated methods they can just edit the
355
So we can ship a default id-tagging which works the same as CVS/Svn:
356
give warnings for files that are not known to be junk. This is the
357
default in baz right now.
359
Also we have .arch-inventory, which is per-directory.
363
Why not have 'baz ignore FILENAME'? To remove ignores, perhaps you
364
have to edit the .arch-inventory. Print "FILTER added to
365
PATH/.arch-inventory"; create and baz-add this file if it doesn't.
367
Docs should perhaps emphasize .arch-inventory as the basic method and
368
only mention =tagging-method as an advanced topic.
372
Should this really be regexps, or just file globs?
b'\\ No newline at end of file'