~bzr-pqm/bzr/bzr.dev : contents of doc/todo-from-arch.txt at revision 1185.16.74

~bzr-pqm/bzr/bzr.dev : (revision 1185.16.74)

***************************************** 
Opportunities for improvement on GNU Arch
***************************************** 

[note that this document is rather out of date in 2005-08]

GNU Arch is one influence on bazaar-ng.  There are several things we
would change from Arch in Bazaar to (we hope) improve the user
experience.

The core design of Arch is good, brilliant even.  It can scale from
small projects too large ones, and is a good foundation for building
tools on top.  However, the design is far too complex, both in
concepts and execution.  So the plan is to cut out as many things as
we can, add a few other good concepts from other systems, and try to
make it into a whole that is consistent and understandable.


Good bits to keep
-----------------

* Roll-up changesets

  No other system is able to express this valuable idea: "I merged all
  these changes from other people; here is the result."

  However, it should *also* be possible to bring in perfect-fit
  patches without creating a new commit.

* Star-merge

  Find a common ancestor on diverged and cross-merged branches.

* Apply isolated changesets.

  We should extend this by having a good way to send changesets by
  email, preferably readable even by people who are not using Arch.

* GPG signing of commits.

  Open source hackers almost all have GPG keys already, and GPG deals
  with a lot of PKI functions to do with propagating, signing and
  revoking keys.

  Signed commits are interesting in many ways, not least of which in
  detecting intrusion to code servers.

* Anonymous downloads can be done without an active server.

  Good for security; also very good for people who do not have a
  permnanently-connected machine on which they can install their own
  software, or which is very tightly secured.

  It's neat that you can upload over only sftp/ftp, but I'm not sure
  it's really worth the hassle; getting properly atomic operations
  over remote-file protocols is hard.

* Clean and transparent storage format.

  This is a neat hack, and gives people assurance that they can get
  their data back out again even if the tool disappears.  Very nice.
  (Bazaar-NG won't keep the exact same format, but the ideas will be
  similar.) 

* Relatively easily parseable/scriptable shell interface.  Good for
  people writing web/emacs/editor/IDE interfaces, or scripts based it.

* Automatically build (and hardlink) revision libraries, with
  consistency checks.

  I don't know how many people want *every* revision in a library, but
  it can be handy to have a few key ones.

  In general making use of hardlinks when they are available and safe
  is nice.

* Rely on ssh for remote access, authentication, and confidentiality.

* Patch headers separate from patch bodies.  (Sometimes you only want
  one.)

* Autogeneration of Changelogs -- but should be in GNU format, at
  least optionally.  I'm not convinced auto-updating them in the tree
  is worthwhile; it makes merges weird.

* Sealing branches.

  It seems useful to prevent accidental commits to things that are
  meant to be stable.  However, the set-once nature of sealing is
  undesirable, because people can make mistakes or want to seal more
  than once.

  One possibility is to have a voluntary write-protect flag set on
  branches that should not normally be updated.  One can remove the
  flag if it turns out it was set wrongly.
 
* ``resolved`` command in Bazaar-1.1

  Good for preventing accidental breakage.

* Multi-level undo -- though could perhaps be more understandable,
  perhaps through ``undo-history``.


Bits to cut out
---------------

One lesson from usability design is that it does not always work to
have a complex model and then try to hide complexity in the user
interface.  If you want something to be a joy to use, that must be
designed in from the bottom up.

  (Some developers may react to tla by thinking "eww, how gross" on
  particular points.  As much as possible we might like to fix these.)

* General impression that the tool is telling you how to run your life.

* Non-standard terminology

  Arch uses terms like "version" and "category" in ways that are
  confusing to people accustomed to other version control systems.
  This is not helpful.

  Therefore: development proceeds on a *branch*, which is a series of
  *revisions*.  Simple and obvious.

* Too many commands.

* Command-line options are wierdly inconsistent with both other
  systems, with each others, and with what people would like to do.
  For example, I would think the obvious usage is ``bzr diff [FILE]``,
  but ``tla diff`` does not let you specify a file at all.

  Most commands should take filenames as their argument: log, diff,
  add, commit, etc.

* Despite having too many commands, there are massive and glaring
  gaps, such reverting a single file or a tree.

* Commands are too different from what people are used to in CVS, and
  often not for a good reason.

* Identifiers are too long.  In part this is because Arch tries to
  have identifiers which are both human-assigned and universally unique. 

* Archive names are probably unnecessary.

* Part of the reason for complexity in archives is that the Arch
  design wants to be able to go and find patches on other branches at
  a later time.  (This is not really implemented or used at the
  moment.)

  I think the complexity is unjustified: changesets and revisions have
  universally unique names so they can simply be archived, either on
  the machine of the person who wants them or on a central site like
  supermirror.

* The tool is *unforgiving*; if people create a branch with the wrong
  name it will be around forever.

* Branches are heaviweight; a record always persists in the archive.
  Sometimes it is good to create micro-branches, try something out,
  and then discard them.  If nobody wants the changes, there is no
  reason for the tool to keep them.

* Working offline requires creating a new branch and merging back and
  forth.  This is both more work than it should be, and also polutes
  the "story" told by branching.

  As much as possible, the *accidental* difference of the location of
  the repository should not effect the *semantics* of branches.
  
  (However, some merging may obviously be necessary when there is
  divergence.) 
 
* Archive registration.  This causes confusion and is unnecessary.

  Proposed solutions such as archive aliases or an additional command
  to register-and-get make it worse.

* Wierd file names (``++`` and ``,,``, which persist in user
  directories and cause breakage of many tools.  Gives a bad
  impression, and it's even worse when people have to interact with
  them.

* Overly-long identifiers.  (One advantage of pointing to branches
  using filenames or URLs is that the length of the path depends on
  how close it is to the users location, and they can more easily use 

* Too slow by default.

  Arch can be made fast, but in the hands of a nonexpert user it is
  often slow.  For most users, disk is cheaper than CPU time, which is
  cheaper than network roundtrips.  The performance model should be
  transparent -- users should not be surprised that something is slow.

* Tagging onto branches.

  Unifying tags and commits is interesting, but the result is hard to
  mentally model; even Arch maintainers can't say exactly how it is
  supposed to work in some cases.

* Reinventing the world from scratch in libhackerlab/frob/pika/xl.

  Those are all fine projects and may be useful in the future, but
  they are totally unnecessary to write a great version control
  system.  It is not an enormous project; it is not CPU-cycle
  critical; something like Python will be fine.

* Lack (for the moment) of an active server.

  Given that network traffic is the most expensive thing, we can
  possibly get a better solution by having intelligence on both sides
  of the link.  Suppose we want to get just one file from a previous
  revision...

* Poor Windows/Mac support.

  Even though many developers only work on Linux, this still holds a
  tool back.  The reason is this: at least some projects have some
  developers on Windows some of the time.  Those projects can't switch
  to Arch.  Most people want to only learn one tool deeply, so it
  won't be Arch.

  Don't make any overly Unixy assumptions.  Avoid too-cute filesystem
  dependencies.

  Being in Python should help with portability: people do need to
  install it, but many developers will already have it and the total
  burden is possibly less than that of installing C requisite
  libraries.

* Quirky filename support.

  Files with non-ascii names, or names containing whitespace tend to
  be handled poorly, perhaps partly because of arch's shell heritage.

  By swallowing XML we do at least get automatic quoting of wierd
  strings, and we will always use UTF-8 for internal storage.

* Complex file-id-tagging 

  Nobody should be expected to understand this.  There are two basic
  cases: people want to auto-add everything, and want to add by hand.
  Both can be reasonably accomodated in a simpler system.

* Complex naming-convention regexps in ``.arch-inventory`` and 
  ``{arch}/id-tagging-method``.  (The fact that there are two
  overlapping mechanisms with very different names is also bad.)

  All this complexity basically just comes down to versioned, ignored,
  unknown, the same as in every other system.  So we might as well
  just have that.

  There are relatively few cases where regexps help more than globs,
  and people do find them more complex.  Even experienced users can
  forget to escape ``\.``.  We can have a bit of flexibility with
  (say) zsh-style extended globs like ``*.(pyo|pyc)``.

* Some files inside ``{arch}`` are meant to be edited by the user, and
  some are not.  This is a flaw common to other systems, including
  Bitkeeper.  The user should be clear on whether they should touch
  things in a directory or not.

* Source-librarian function works poorly.

  It is not the place of a tool to force people to stay organized; it
  should just facilitate it.  In any case, a library without
  descriptive text is of little use.  So bazaar-ng does not force
  three-level naming but rather lets people arrange their own trees,
  and put on their own descriptions (either within the tree, or by
  e.g. having a wiki page listing branches, descriptions and URLs.)

* Whining about inode mismatches on pristines/revlibs.

  It's fine that there is validation, but the tool should not show off
  its limitations.  Just do the right thing.

* More generally, not quite enough consistency/safety checking.

* Unclear what commands work on subdirs and what works on the whole
  tree.

* Hard to share work on a single branch -- though still not really too
  bad.

* Lack of partial commits of added/deleted files.

* Separate id tags for each file; simple implementation but probably
  costs too much disk space.

* Way too many deeply-nested directories; should be just one.

* ``.listing`` files are ugly and a point of failure.  They can cause
  trouble on some servers which limit access to dot files.

  Isn't it possible to have the top-level file be predictable and find
  everything else needed from there?

* Summary separate from log message.

  Simpler to just have one message, and let people extract the first
  line/sentence if they wish.

  Rather than 'keywords', let arbitrary properties be attached to the
  revision at the time of commit.



Simpler disconnected operation
------------------------------

A basic distributed VCS operation is to make it easy to work on an
offline laptop.  Arch can do this in a few ways, but none of them are
really simple.

http://wiki.gnuarch.org/moin.cgi/mini_5fTravellingOftenWithArch

Yaron Minsky writes (2005-01-18):

    I was wondering what people considered to be a good setup for using
    Arch on a laptop.  Here's the basic situation.  I have a few projects
    that reside in arch repositories on my desktop computer.    Basically,
    I'd like to be able to do commits from my laptop, and have those
    commits eventually migrate up to the main repository.  I understand
    that the right way of doing this is to set up archives on the laptop. 
    But what's the cleanest way of doing this?  And is there some way of
    making the commits I do on the laptop show up cleanly and individually
    on the desktop once they are merged in?


Tagging-method
--------------

baz default is much less strict.  

Much of tla depends on being able to categorize files.  Some hangovers
from larch -- eg precious and backup are essentially the same.  junk
is never deleted today.  

Automatic version control with 'untagged-source source'.  But this is
deprecated for baz?

Annoyed by

 - defaults
 - having the feature at all
 - complex way to define it

Default of 166 lines.

Remove id-tagging-method command or at most make it read-only.  If
people really want to use deprecated methods they can just edit the
file.

So we can ship a default id-tagging which works the same as CVS/Svn:
give warnings for files that are not known to be junk.  This is the
default in baz right now.

Also we have .arch-inventory, which is per-directory.



Why not have 'baz ignore FILENAME'?  To remove ignores, perhaps you
have to edit the .arch-inventory.  Print "FILTER added to
PATH/.arch-inventory"; create and baz-add this file if it doesn't.

Docs should perhaps emphasize .arch-inventory as the basic method and
only mention =tagging-method as an advanced topic.



Should this really be regexps, or just file globs?

6 by mbp at sourcefrog import all docs from arch	1	*****************************************
	2	Opportunities for improvement on GNU Arch
	3	*****************************************
	4
1156 by Martin Pool - old docs: clarify that this is not mainly descended from arch anymore	5	[note that this document is rather out of date in 2005-08]
6 by mbp at sourcefrog import all docs from arch	6
1156 by Martin Pool - old docs: clarify that this is not mainly descended from arch anymore	7	GNU Arch is one influence on bazaar-ng. There are several things we
	8	would change from Arch in Bazaar to (we hope) improve the user
	9	experience.
6 by mbp at sourcefrog import all docs from arch	10
	11	The core design of Arch is good, brilliant even. It can scale from
	12	small projects too large ones, and is a good foundation for building
	13	tools on top. However, the design is far too complex, both in
	14	concepts and execution. So the plan is to cut out as many things as
	15	we can, add a few other good concepts from other systems, and try to
	16	make it into a whole that is consistent and understandable.
	17
	18
	19	Good bits to keep
	20	-----------------
	21
	22	* Roll-up changesets
	23
	24	No other system is able to express this valuable idea: "I merged all
	25	these changes from other people; here is the result."
	26
	27	However, it should also be possible to bring in perfect-fit
	28	patches without creating a new commit.
	29
	30	* Star-merge
	31
	32	Find a common ancestor on diverged and cross-merged branches.
	33
	34	* Apply isolated changesets.
	35
	36	We should extend this by having a good way to send changesets by
	37	email, preferably readable even by people who are not using Arch.
	38
	39	* GPG signing of commits.
	40
	41	Open source hackers almost all have GPG keys already, and GPG deals
	42	with a lot of PKI functions to do with propagating, signing and
	43	revoking keys.
	44
	45	Signed commits are interesting in many ways, not least of which in
	46	detecting intrusion to code servers.
	47
	48	* Anonymous downloads can be done without an active server.
	49
	50	Good for security; also very good for people who do not have a
	51	permnanently-connected machine on which they can install their own
	52	software, or which is very tightly secured.
	53
	54	It's neat that you can upload over only sftp/ftp, but I'm not sure
	55	it's really worth the hassle; getting properly atomic operations
	56	over remote-file protocols is hard.
	57
	58	* Clean and transparent storage format.
	59
	60	This is a neat hack, and gives people assurance that they can get
	61	their data back out again even if the tool disappears. Very nice.
	62	(Bazaar-NG won't keep the exact same format, but the ideas will be
	63	similar.)
	64
	65	* Relatively easily parseable/scriptable shell interface. Good for
	66	people writing web/emacs/editor/IDE interfaces, or scripts based it.
	67
	68	* Automatically build (and hardlink) revision libraries, with
	69	consistency checks.
	70
	71	I don't know how many people want every revision in a library, but
	72	it can be handy to have a few key ones.
	73
74	In general making use of hardlinks when they are available and safe
75	is nice.
76
77	* Rely on ssh for remote access, authentication, and confidentiality.
78
79	* Patch headers separate from patch bodies. (Sometimes you only want
80	one.)
81
82	* Autogeneration of Changelogs -- but should be in GNU format, at
83	least optionally. I'm not convinced auto-updating them in the tree
254 by Martin Pool - Doc cleanups from Magnus Therning	84	is worthwhile; it makes merges weird.
6 by mbp at sourcefrog import all docs from arch	85
	86	* Sealing branches.
	87
	88	It seems useful to prevent accidental commits to things that are
	89	meant to be stable. However, the set-once nature of sealing is
	90	undesirable, because people can make mistakes or want to seal more
	91	than once.
	92
	93	One possibility is to have a voluntary write-protect flag set on
	94	branches that should not normally be updated. One can remove the
	95	flag if it turns out it was set wrongly.
	96
	97	* ``resolved`` command in Bazaar-1.1
	98
	99	Good for preventing accidental breakage.
	100
	101	* Multi-level undo -- though could perhaps be more understandable,
	102	perhaps through ``undo-history``.
	103
	104
	105	Bits to cut out
	106	---------------
	107
	108	One lesson from usability design is that it does not always work to
	109	have a complex model and then try to hide complexity in the user
	110	interface. If you want something to be a joy to use, that must be
	111	designed in from the bottom up.
	112
	113	(Some developers may react to tla by thinking "eww, how gross" on
	114	particular points. As much as possible we might like to fix these.)
	115
	116	* General impression that the tool is telling you how to run your life.
	117
	118	* Non-standard terminology
	119
	120	Arch uses terms like "version" and "category" in ways that are
	121	confusing to people accustomed to other version control systems.
	122	This is not helpful.
	123
	124	Therefore: development proceeds on a branch, which is a series of
	125	revisions. Simple and obvious.
	126
	127	* Too many commands.
	128
	129	* Command-line options are wierdly inconsistent with both other
	130	systems, with each others, and with what people would like to do.
	131	For example, I would think the obvious usage is ``bzr diff [FILE]``,
	132	but ``tla diff`` does not let you specify a file at all.
	133
	134	Most commands should take filenames as their argument: log, diff,
	135	add, commit, etc.
	136
	137	* Despite having too many commands, there are massive and glaring
	138	gaps, such reverting a single file or a tree.
	139
	140	* Commands are too different from what people are used to in CVS, and
	141	often not for a good reason.
	142
	143	* Identifiers are too long. In part this is because Arch tries to
	144	have identifiers which are both human-assigned and universally unique.
	145
	146	* Archive names are probably unnecessary.
	147
	148	* Part of the reason for complexity in archives is that the Arch
149	design wants to be able to go and find patches on other branches at
150	a later time. (This is not really implemented or used at the
151	moment.)
152
153	I think the complexity is unjustified: changesets and revisions have
154	universally unique names so they can simply be archived, either on
155	the machine of the person who wants them or on a central site like
156	supermirror.
157
158	* The tool is unforgiving; if people create a branch with the wrong
159	name it will be around forever.
160
161	* Branches are heaviweight; a record always persists in the archive.
162	Sometimes it is good to create micro-branches, try something out,
163	and then discard them. If nobody wants the changes, there is no
164	reason for the tool to keep them.
165
166	* Working offline requires creating a new branch and merging back and
167	forth. This is both more work than it should be, and also polutes
168	the "story" told by branching.
169
170	As much as possible, the accidental difference of the location of
171	the repository should not effect the semantics of branches.
172
173	(However, some merging may obviously be necessary when there is
174	divergence.)
175
176	* Archive registration. This causes confusion and is unnecessary.
177
178	Proposed solutions such as archive aliases or an additional command
179	to register-and-get make it worse.
180
181	* Wierd file names (``++`` and ``,,``, which persist in user
182	directories and cause breakage of many tools. Gives a bad
183	impression, and it's even worse when people have to interact with
184	them.
185
186	* Overly-long identifiers. (One advantage of pointing to branches
187	using filenames or URLs is that the length of the path depends on
188	how close it is to the users location, and they can more easily use
189
190	* Too slow by default.
191
192	Arch can be made fast, but in the hands of a nonexpert user it is
193	often slow. For most users, disk is cheaper than CPU time, which is
194	cheaper than network roundtrips. The performance model should be
195	transparent -- users should not be surprised that something is slow.
196
197	* Tagging onto branches.
198
199	Unifying tags and commits is interesting, but the result is hard to
200	mentally model; even Arch maintainers can't say exactly how it is
201	supposed to work in some cases.
202
203	* Reinventing the world from scratch in libhackerlab/frob/pika/xl.
204
205	Those are all fine projects and may be useful in the future, but
206	they are totally unnecessary to write a great version control
207	system. It is not an enormous project; it is not CPU-cycle
208	critical; something like Python will be fine.
209
210	* Lack (for the moment) of an active server.
211
212	Given that network traffic is the most expensive thing, we can
213	possibly get a better solution by having intelligence on both sides
214	of the link. Suppose we want to get just one file from a previous
215	revision...
216
217	* Poor Windows/Mac support.
218
219	Even though many developers only work on Linux, this still holds a
220	tool back. The reason is this: at least some projects have some
221	developers on Windows some of the time. Those projects can't switch
222	to Arch. Most people want to only learn one tool deeply, so it
223	won't be Arch.
224
225	Don't make any overly Unixy assumptions. Avoid too-cute filesystem
226	dependencies.
227
228	Being in Python should help with portability: people do need to
229	install it, but many developers will already have it and the total
230	burden is possibly less than that of installing C requisite
231	libraries.
232
233	* Quirky filename support.
234
235	Files with non-ascii names, or names containing whitespace tend to
236	be handled poorly, perhaps partly because of arch's shell heritage.
237
238	By swallowing XML we do at least get automatic quoting of wierd
239	strings, and we will always use UTF-8 for internal storage.
240
241	* Complex file-id-tagging
242
243	Nobody should be expected to understand this. There are two basic
244	cases: people want to auto-add everything, and want to add by hand.
245	Both can be reasonably accomodated in a simpler system.
246
247	* Complex naming-convention regexps in ``.arch-inventory`` and
248	``{arch}/id-tagging-method``. (The fact that there are two
249	overlapping mechanisms with very different names is also bad.)
250
251	All this complexity basically just comes down to versioned, ignored,
252	unknown, the same as in every other system. So we might as well
253	just have that.
254
255	There are relatively few cases where regexps help more than globs,
256	and people do find them more complex. Even experienced users can
257	forget to escape ``\.``. We can have a bit of flexibility with
258	(say) zsh-style extended globs like ``*.(pyo\|pyc)``.
259
260	* Some files inside ``{arch}`` are meant to be edited by the user, and
261	some are not. This is a flaw common to other systems, including
262	Bitkeeper. The user should be clear on whether they should touch
263	things in a directory or not.
264
265	* Source-librarian function works poorly.
266
267	It is not the place of a tool to force people to stay organized; it
268	should just facilitate it. In any case, a library without
269	descriptive text is of little use. So bazaar-ng does not force
270	three-level naming but rather lets people arrange their own trees,
271	and put on their own descriptions (either within the tree, or by
272	e.g. having a wiki page listing branches, descriptions and URLs.)
273
274	* Whining about inode mismatches on pristines/revlibs.
275
276	It's fine that there is validation, but the tool should not show off
277	its limitations. Just do the right thing.
278
279	* More generally, not quite enough consistency/safety checking.
280
281	* Unclear what commands work on subdirs and what works on the whole
282	tree.
283
284	* Hard to share work on a single branch -- though still not really too
285	bad.
286
287	* Lack of partial commits of added/deleted files.
288
289	* Separate id tags for each file; simple implementation but probably
290	costs too much disk space.
291
292	* Way too many deeply-nested directories; should be just one.
293
294	* ``.listing`` files are ugly and a point of failure. They can cause
295	trouble on some servers which limit access to dot files.
296
297	Isn't it possible to have the top-level file be predictable and find
298	everything else needed from there?
299
300	* Summary separate from log message.
301
302	Simpler to just have one message, and let people extract the first
303	line/sentence if they wish.
304
305	Rather than 'keywords', let arbitrary properties be attached to the
306	revision at the time of commit.
307
308
309
310	Simpler disconnected operation
311	------------------------------
312
313	A basic distributed VCS operation is to make it easy to work on an
314	offline laptop. Arch can do this in a few ways, but none of them are
315	really simple.
316
317	http://wiki.gnuarch.org/moin.cgi/mini_5fTravellingOftenWithArch
318
319	Yaron Minsky writes (2005-01-18):
320
321	I was wondering what people considered to be a good setup for using
322	Arch on a laptop. Here's the basic situation. I have a few projects
323	that reside in arch repositories on my desktop computer. Basically,
324	I'd like to be able to do commits from my laptop, and have those
325	commits eventually migrate up to the main repository. I understand
326	that the right way of doing this is to set up archives on the laptop.
327	But what's the cleanest way of doing this? And is there some way of
328	making the commits I do on the laptop show up cleanly and individually
329	on the desktop once they are merged in?
330
331
332	Tagging-method
333	--------------
334
335	baz default is much less strict.
336
337	Much of tla depends on being able to categorize files. Some hangovers
338	from larch -- eg precious and backup are essentially the same. junk
339	is never deleted today.
340
341	Automatic version control with 'untagged-source source'. But this is
342	deprecated for baz?
343
344	Annoyed by
345
346	- defaults
347	- having the feature at all
348	- complex way to define it
349
350	Default of 166 lines.
351
352	Remove id-tagging-method command or at most make it read-only. If
353	people really want to use deprecated methods they can just edit the
354	file.
355
356	So we can ship a default id-tagging which works the same as CVS/Svn:
357	give warnings for files that are not known to be junk. This is the
358	default in baz right now.
359
360	Also we have .arch-inventory, which is per-directory.
361
362
363
364	Why not have 'baz ignore FILENAME'? To remove ignores, perhaps you
365	have to edit the .arch-inventory. Print "FILTER added to
366	PATH/.arch-inventory"; create and baz-add this file if it doesn't.
367
368	Docs should perhaps emphasize .arch-inventory as the basic method and
369	only mention =tagging-method as an advanced topic.
370
371
372
373	Should this really be regexps, or just file globs?