6
by mbp at sourcefrog
import all docs from arch |
1 |
***************************************** |
2 |
Opportunities for improvement on GNU Arch |
|
3 |
***************************************** |
|
4 |
||
1156
by Martin Pool
- old docs: clarify that this is not mainly descended from arch anymore |
5 |
[note that this document is rather out of date in 2005-08] |
6
by mbp at sourcefrog
import all docs from arch |
6 |
|
1156
by Martin Pool
- old docs: clarify that this is not mainly descended from arch anymore |
7 |
GNU Arch is one influence on bazaar-ng. There are several things we |
8 |
would change from Arch in Bazaar to (we hope) improve the user |
|
9 |
experience. |
|
6
by mbp at sourcefrog
import all docs from arch |
10 |
|
11 |
The core design of Arch is good, brilliant even. It can scale from |
|
12 |
small projects too large ones, and is a good foundation for building |
|
13 |
tools on top. However, the design is far too complex, both in |
|
14 |
concepts and execution. So the plan is to cut out as many things as |
|
15 |
we can, add a few other good concepts from other systems, and try to |
|
16 |
make it into a whole that is consistent and understandable. |
|
17 |
||
18 |
||
19 |
Good bits to keep |
|
20 |
----------------- |
|
21 |
||
22 |
* Roll-up changesets |
|
23 |
||
24 |
No other system is able to express this valuable idea: "I merged all |
|
25 |
these changes from other people; here is the result." |
|
26 |
||
27 |
However, it should *also* be possible to bring in perfect-fit |
|
28 |
patches without creating a new commit. |
|
29 |
||
30 |
* Star-merge |
|
31 |
||
32 |
Find a common ancestor on diverged and cross-merged branches. |
|
33 |
||
34 |
* Apply isolated changesets. |
|
35 |
||
36 |
We should extend this by having a good way to send changesets by |
|
37 |
email, preferably readable even by people who are not using Arch. |
|
38 |
||
39 |
* GPG signing of commits. |
|
40 |
||
41 |
Open source hackers almost all have GPG keys already, and GPG deals |
|
42 |
with a lot of PKI functions to do with propagating, signing and |
|
43 |
revoking keys. |
|
44 |
||
45 |
Signed commits are interesting in many ways, not least of which in |
|
46 |
detecting intrusion to code servers. |
|
47 |
||
48 |
* Anonymous downloads can be done without an active server. |
|
49 |
||
50 |
Good for security; also very good for people who do not have a |
|
51 |
permnanently-connected machine on which they can install their own |
|
52 |
software, or which is very tightly secured. |
|
53 |
||
54 |
It's neat that you can upload over only sftp/ftp, but I'm not sure |
|
55 |
it's really worth the hassle; getting properly atomic operations |
|
56 |
over remote-file protocols is hard. |
|
57 |
||
58 |
* Clean and transparent storage format. |
|
59 |
||
60 |
This is a neat hack, and gives people assurance that they can get |
|
61 |
their data back out again even if the tool disappears. Very nice. |
|
62 |
(Bazaar-NG won't keep the exact same format, but the ideas will be |
|
63 |
similar.) |
|
64 |
||
65 |
* Relatively easily parseable/scriptable shell interface. Good for |
|
66 |
people writing web/emacs/editor/IDE interfaces, or scripts based it. |
|
67 |
||
68 |
* Automatically build (and hardlink) revision libraries, with |
|
69 |
consistency checks. |
|
70 |
||
71 |
I don't know how many people want *every* revision in a library, but |
|
72 |
it can be handy to have a few key ones. |
|
73 |
||
74 |
In general making use of hardlinks when they are available and safe |
|
75 |
is nice. |
|
76 |
||
77 |
* Rely on ssh for remote access, authentication, and confidentiality. |
|
78 |
||
79 |
* Patch headers separate from patch bodies. (Sometimes you only want |
|
80 |
one.) |
|
81 |
||
82 |
* Autogeneration of Changelogs -- but should be in GNU format, at |
|
83 |
least optionally. I'm not convinced auto-updating them in the tree |
|
254
by Martin Pool
- Doc cleanups from Magnus Therning |
84 |
is worthwhile; it makes merges weird. |
6
by mbp at sourcefrog
import all docs from arch |
85 |
|
86 |
* Sealing branches. |
|
87 |
||
88 |
It seems useful to prevent accidental commits to things that are |
|
89 |
meant to be stable. However, the set-once nature of sealing is |
|
90 |
undesirable, because people can make mistakes or want to seal more |
|
91 |
than once. |
|
92 |
||
93 |
One possibility is to have a voluntary write-protect flag set on |
|
94 |
branches that should not normally be updated. One can remove the |
|
95 |
flag if it turns out it was set wrongly. |
|
96 |
||
97 |
* ``resolved`` command in Bazaar-1.1 |
|
98 |
||
99 |
Good for preventing accidental breakage. |
|
100 |
||
101 |
* Multi-level undo -- though could perhaps be more understandable, |
|
102 |
perhaps through ``undo-history``. |
|
103 |
||
104 |
||
105 |
Bits to cut out |
|
106 |
--------------- |
|
107 |
||
108 |
One lesson from usability design is that it does not always work to |
|
109 |
have a complex model and then try to hide complexity in the user |
|
110 |
interface. If you want something to be a joy to use, that must be |
|
111 |
designed in from the bottom up. |
|
112 |
||
113 |
(Some developers may react to tla by thinking "eww, how gross" on |
|
114 |
particular points. As much as possible we might like to fix these.) |
|
115 |
||
116 |
* General impression that the tool is telling you how to run your life. |
|
117 |
||
118 |
* Non-standard terminology |
|
119 |
||
120 |
Arch uses terms like "version" and "category" in ways that are |
|
121 |
confusing to people accustomed to other version control systems. |
|
122 |
This is not helpful. |
|
123 |
||
124 |
Therefore: development proceeds on a *branch*, which is a series of |
|
125 |
*revisions*. Simple and obvious. |
|
126 |
||
127 |
* Too many commands. |
|
128 |
||
129 |
* Command-line options are wierdly inconsistent with both other |
|
130 |
systems, with each others, and with what people would like to do. |
|
131 |
For example, I would think the obvious usage is ``bzr diff [FILE]``, |
|
132 |
but ``tla diff`` does not let you specify a file at all. |
|
133 |
||
134 |
Most commands should take filenames as their argument: log, diff, |
|
135 |
add, commit, etc. |
|
136 |
||
137 |
* Despite having too many commands, there are massive and glaring |
|
138 |
gaps, such reverting a single file or a tree. |
|
139 |
||
140 |
* Commands are too different from what people are used to in CVS, and |
|
141 |
often not for a good reason. |
|
142 |
||
143 |
* Identifiers are too long. In part this is because Arch tries to |
|
144 |
have identifiers which are both human-assigned and universally unique. |
|
145 |
||
146 |
* Archive names are probably unnecessary. |
|
147 |
||
148 |
* Part of the reason for complexity in archives is that the Arch |
|
149 |
design wants to be able to go and find patches on other branches at |
|
150 |
a later time. (This is not really implemented or used at the |
|
151 |
moment.) |
|
152 |
||
153 |
I think the complexity is unjustified: changesets and revisions have |
|
154 |
universally unique names so they can simply be archived, either on |
|
155 |
the machine of the person who wants them or on a central site like |
|
156 |
supermirror. |
|
157 |
||
158 |
* The tool is *unforgiving*; if people create a branch with the wrong |
|
159 |
name it will be around forever. |
|
160 |
||
161 |
* Branches are heaviweight; a record always persists in the archive. |
|
162 |
Sometimes it is good to create micro-branches, try something out, |
|
163 |
and then discard them. If nobody wants the changes, there is no |
|
164 |
reason for the tool to keep them. |
|
165 |
||
166 |
* Working offline requires creating a new branch and merging back and |
|
167 |
forth. This is both more work than it should be, and also polutes |
|
168 |
the "story" told by branching. |
|
169 |
||
170 |
As much as possible, the *accidental* difference of the location of |
|
171 |
the repository should not effect the *semantics* of branches. |
|
172 |
||
173 |
(However, some merging may obviously be necessary when there is |
|
174 |
divergence.) |
|
175 |
||
176 |
* Archive registration. This causes confusion and is unnecessary. |
|
177 |
||
178 |
Proposed solutions such as archive aliases or an additional command |
|
179 |
to register-and-get make it worse. |
|
180 |
||
181 |
* Wierd file names (``++`` and ``,,``, which persist in user |
|
182 |
directories and cause breakage of many tools. Gives a bad |
|
183 |
impression, and it's even worse when people have to interact with |
|
184 |
them. |
|
185 |
||
186 |
* Overly-long identifiers. (One advantage of pointing to branches |
|
187 |
using filenames or URLs is that the length of the path depends on |
|
188 |
how close it is to the users location, and they can more easily use |
|
189 |
||
190 |
* Too slow by default. |
|
191 |
||
192 |
Arch can be made fast, but in the hands of a nonexpert user it is |
|
193 |
often slow. For most users, disk is cheaper than CPU time, which is |
|
194 |
cheaper than network roundtrips. The performance model should be |
|
195 |
transparent -- users should not be surprised that something is slow. |
|
196 |
||
197 |
* Tagging onto branches. |
|
198 |
||
199 |
Unifying tags and commits is interesting, but the result is hard to |
|
200 |
mentally model; even Arch maintainers can't say exactly how it is |
|
201 |
supposed to work in some cases. |
|
202 |
||
203 |
* Reinventing the world from scratch in libhackerlab/frob/pika/xl. |
|
204 |
||
205 |
Those are all fine projects and may be useful in the future, but |
|
206 |
they are totally unnecessary to write a great version control |
|
207 |
system. It is not an enormous project; it is not CPU-cycle |
|
208 |
critical; something like Python will be fine. |
|
209 |
||
210 |
* Lack (for the moment) of an active server. |
|
211 |
||
212 |
Given that network traffic is the most expensive thing, we can |
|
213 |
possibly get a better solution by having intelligence on both sides |
|
214 |
of the link. Suppose we want to get just one file from a previous |
|
215 |
revision... |
|
216 |
||
217 |
* Poor Windows/Mac support. |
|
218 |
||
219 |
Even though many developers only work on Linux, this still holds a |
|
220 |
tool back. The reason is this: at least some projects have some |
|
221 |
developers on Windows some of the time. Those projects can't switch |
|
222 |
to Arch. Most people want to only learn one tool deeply, so it |
|
223 |
won't be Arch. |
|
224 |
||
225 |
Don't make any overly Unixy assumptions. Avoid too-cute filesystem |
|
226 |
dependencies. |
|
227 |
||
228 |
Being in Python should help with portability: people do need to |
|
229 |
install it, but many developers will already have it and the total |
|
230 |
burden is possibly less than that of installing C requisite |
|
231 |
libraries. |
|
232 |
||
233 |
* Quirky filename support. |
|
234 |
||
235 |
Files with non-ascii names, or names containing whitespace tend to |
|
236 |
be handled poorly, perhaps partly because of arch's shell heritage. |
|
237 |
||
238 |
By swallowing XML we do at least get automatic quoting of wierd |
|
239 |
strings, and we will always use UTF-8 for internal storage. |
|
240 |
||
241 |
* Complex file-id-tagging |
|
242 |
||
243 |
Nobody should be expected to understand this. There are two basic |
|
244 |
cases: people want to auto-add everything, and want to add by hand. |
|
245 |
Both can be reasonably accomodated in a simpler system. |
|
246 |
||
247 |
* Complex naming-convention regexps in ``.arch-inventory`` and |
|
248 |
``{arch}/id-tagging-method``. (The fact that there are two |
|
249 |
overlapping mechanisms with very different names is also bad.) |
|
250 |
||
251 |
All this complexity basically just comes down to versioned, ignored, |
|
252 |
unknown, the same as in every other system. So we might as well |
|
253 |
just have that. |
|
254 |
||
255 |
There are relatively few cases where regexps help more than globs, |
|
256 |
and people do find them more complex. Even experienced users can |
|
257 |
forget to escape ``\.``. We can have a bit of flexibility with |
|
258 |
(say) zsh-style extended globs like ``*.(pyo|pyc)``. |
|
259 |
||
260 |
* Some files inside ``{arch}`` are meant to be edited by the user, and |
|
261 |
some are not. This is a flaw common to other systems, including |
|
262 |
Bitkeeper. The user should be clear on whether they should touch |
|
263 |
things in a directory or not. |
|
264 |
||
265 |
* Source-librarian function works poorly. |
|
266 |
||
267 |
It is not the place of a tool to force people to stay organized; it |
|
268 |
should just facilitate it. In any case, a library without |
|
269 |
descriptive text is of little use. So bazaar-ng does not force |
|
270 |
three-level naming but rather lets people arrange their own trees, |
|
271 |
and put on their own descriptions (either within the tree, or by |
|
272 |
e.g. having a wiki page listing branches, descriptions and URLs.) |
|
273 |
||
274 |
* Whining about inode mismatches on pristines/revlibs. |
|
275 |
||
276 |
It's fine that there is validation, but the tool should not show off |
|
277 |
its limitations. Just do the right thing. |
|
278 |
||
279 |
* More generally, not quite enough consistency/safety checking. |
|
280 |
||
281 |
* Unclear what commands work on subdirs and what works on the whole |
|
282 |
tree. |
|
283 |
||
284 |
* Hard to share work on a single branch -- though still not really too |
|
285 |
bad. |
|
286 |
||
287 |
* Lack of partial commits of added/deleted files. |
|
288 |
||
289 |
* Separate id tags for each file; simple implementation but probably |
|
290 |
costs too much disk space. |
|
291 |
||
292 |
* Way too many deeply-nested directories; should be just one. |
|
293 |
||
294 |
* ``.listing`` files are ugly and a point of failure. They can cause |
|
295 |
trouble on some servers which limit access to dot files. |
|
296 |
||
297 |
Isn't it possible to have the top-level file be predictable and find |
|
298 |
everything else needed from there? |
|
299 |
||
300 |
* Summary separate from log message. |
|
301 |
||
302 |
Simpler to just have one message, and let people extract the first |
|
303 |
line/sentence if they wish. |
|
304 |
||
305 |
Rather than 'keywords', let arbitrary properties be attached to the |
|
306 |
revision at the time of commit. |
|
307 |
||
308 |
||
309 |
||
310 |
Simpler disconnected operation |
|
311 |
------------------------------ |
|
312 |
||
313 |
A basic distributed VCS operation is to make it easy to work on an |
|
314 |
offline laptop. Arch can do this in a few ways, but none of them are |
|
315 |
really simple. |
|
316 |
||
317 |
http://wiki.gnuarch.org/moin.cgi/mini_5fTravellingOftenWithArch |
|
318 |
||
319 |
Yaron Minsky writes (2005-01-18): |
|
320 |
||
321 |
I was wondering what people considered to be a good setup for using |
|
322 |
Arch on a laptop. Here's the basic situation. I have a few projects |
|
323 |
that reside in arch repositories on my desktop computer. Basically, |
|
324 |
I'd like to be able to do commits from my laptop, and have those |
|
325 |
commits eventually migrate up to the main repository. I understand |
|
326 |
that the right way of doing this is to set up archives on the laptop. |
|
327 |
But what's the cleanest way of doing this? And is there some way of |
|
328 |
making the commits I do on the laptop show up cleanly and individually |
|
329 |
on the desktop once they are merged in? |
|
330 |
||
331 |
||
332 |
Tagging-method |
|
333 |
-------------- |
|
334 |
||
335 |
baz default is much less strict. |
|
336 |
||
337 |
Much of tla depends on being able to categorize files. Some hangovers |
|
338 |
from larch -- eg precious and backup are essentially the same. junk |
|
339 |
is never deleted today. |
|
340 |
||
341 |
Automatic version control with 'untagged-source source'. But this is |
|
342 |
deprecated for baz? |
|
343 |
||
344 |
Annoyed by |
|
345 |
||
346 |
- defaults |
|
347 |
- having the feature at all |
|
348 |
- complex way to define it |
|
349 |
||
350 |
Default of 166 lines. |
|
351 |
||
352 |
Remove id-tagging-method command or at most make it read-only. If |
|
353 |
people really want to use deprecated methods they can just edit the |
|
354 |
file. |
|
355 |
||
356 |
So we can ship a default id-tagging which works the same as CVS/Svn: |
|
357 |
give warnings for files that are not known to be junk. This is the |
|
358 |
default in baz right now. |
|
359 |
||
360 |
Also we have .arch-inventory, which is per-directory. |
|
361 |
||
362 |
||
363 |
||
364 |
Why not have 'baz ignore FILENAME'? To remove ignores, perhaps you |
|
365 |
have to edit the .arch-inventory. Print "FILTER added to |
|
366 |
PATH/.arch-inventory"; create and baz-add this file if it doesn't. |
|
367 |
||
368 |
Docs should perhaps emphasize .arch-inventory as the basic method and |
|
369 |
only mention =tagging-method as an advanced topic. |
|
370 |
||
371 |
||
372 |
||
373 |
Should this really be regexps, or just file globs? |