***************************************** Opportunities for improvement on GNU Arch ***************************************** [note that this document is rather out of date in 2005-08] GNU Arch is one influence on bazaar-ng. There are several things we would change from Arch in Bazaar to (we hope) improve the user experience. The core design of Arch is good, brilliant even. It can scale from small projects too large ones, and is a good foundation for building tools on top. However, the design is far too complex, both in concepts and execution. So the plan is to cut out as many things as we can, add a few other good concepts from other systems, and try to make it into a whole that is consistent and understandable. Good bits to keep ----------------- * Roll-up changesets No other system is able to express this valuable idea: "I merged all these changes from other people; here is the result." However, it should *also* be possible to bring in perfect-fit patches without creating a new commit. * Star-merge Find a common ancestor on diverged and cross-merged branches. * Apply isolated changesets. We should extend this by having a good way to send changesets by email, preferably readable even by people who are not using Arch. * GPG signing of commits. Open source hackers almost all have GPG keys already, and GPG deals with a lot of PKI functions to do with propagating, signing and revoking keys. Signed commits are interesting in many ways, not least of which in detecting intrusion to code servers. * Anonymous downloads can be done without an active server. Good for security; also very good for people who do not have a permnanently-connected machine on which they can install their own software, or which is very tightly secured. It's neat that you can upload over only sftp/ftp, but I'm not sure it's really worth the hassle; getting properly atomic operations over remote-file protocols is hard. * Clean and transparent storage format. This is a neat hack, and gives people assurance that they can get their data back out again even if the tool disappears. Very nice. (Bazaar-NG won't keep the exact same format, but the ideas will be similar.) * Relatively easily parseable/scriptable shell interface. Good for people writing web/emacs/editor/IDE interfaces, or scripts based it. * Automatically build (and hardlink) revision libraries, with consistency checks. I don't know how many people want *every* revision in a library, but it can be handy to have a few key ones. In general making use of hardlinks when they are available and safe is nice. * Rely on ssh for remote access, authentication, and confidentiality. * Patch headers separate from patch bodies. (Sometimes you only want one.) * Autogeneration of Changelogs -- but should be in GNU format, at least optionally. I'm not convinced auto-updating them in the tree is worthwhile; it makes merges weird. * Sealing branches. It seems useful to prevent accidental commits to things that are meant to be stable. However, the set-once nature of sealing is undesirable, because people can make mistakes or want to seal more than once. One possibility is to have a voluntary write-protect flag set on branches that should not normally be updated. One can remove the flag if it turns out it was set wrongly. * ``resolved`` command in Bazaar-1.1 Good for preventing accidental breakage. * Multi-level undo -- though could perhaps be more understandable, perhaps through ``undo-history``. Bits to cut out --------------- One lesson from usability design is that it does not always work to have a complex model and then try to hide complexity in the user interface. If you want something to be a joy to use, that must be designed in from the bottom up. (Some developers may react to tla by thinking "eww, how gross" on particular points. As much as possible we might like to fix these.) * General impression that the tool is telling you how to run your life. * Non-standard terminology Arch uses terms like "version" and "category" in ways that are confusing to people accustomed to other version control systems. This is not helpful. Therefore: development proceeds on a *branch*, which is a series of *revisions*. Simple and obvious. * Too many commands. * Command-line options are wierdly inconsistent with both other systems, with each others, and with what people would like to do. For example, I would think the obvious usage is ``bzr diff [FILE]``, but ``tla diff`` does not let you specify a file at all. Most commands should take filenames as their argument: log, diff, add, commit, etc. * Despite having too many commands, there are massive and glaring gaps, such reverting a single file or a tree. * Commands are too different from what people are used to in CVS, and often not for a good reason. * Identifiers are too long. In part this is because Arch tries to have identifiers which are both human-assigned and universally unique. * Archive names are probably unnecessary. * Part of the reason for complexity in archives is that the Arch design wants to be able to go and find patches on other branches at a later time. (This is not really implemented or used at the moment.) I think the complexity is unjustified: changesets and revisions have universally unique names so they can simply be archived, either on the machine of the person who wants them or on a central site like supermirror. * The tool is *unforgiving*; if people create a branch with the wrong name it will be around forever. * Branches are heaviweight; a record always persists in the archive. Sometimes it is good to create micro-branches, try something out, and then discard them. If nobody wants the changes, there is no reason for the tool to keep them. * Working offline requires creating a new branch and merging back and forth. This is both more work than it should be, and also polutes the "story" told by branching. As much as possible, the *accidental* difference of the location of the repository should not effect the *semantics* of branches. (However, some merging may obviously be necessary when there is divergence.) * Archive registration. This causes confusion and is unnecessary. Proposed solutions such as archive aliases or an additional command to register-and-get make it worse. * Wierd file names (``++`` and ``,,``, which persist in user directories and cause breakage of many tools. Gives a bad impression, and it's even worse when people have to interact with them. * Overly-long identifiers. (One advantage of pointing to branches using filenames or URLs is that the length of the path depends on how close it is to the users location, and they can more easily use * Too slow by default. Arch can be made fast, but in the hands of a nonexpert user it is often slow. For most users, disk is cheaper than CPU time, which is cheaper than network roundtrips. The performance model should be transparent -- users should not be surprised that something is slow. * Tagging onto branches. Unifying tags and commits is interesting, but the result is hard to mentally model; even Arch maintainers can't say exactly how it is supposed to work in some cases. * Reinventing the world from scratch in libhackerlab/frob/pika/xl. Those are all fine projects and may be useful in the future, but they are totally unnecessary to write a great version control system. It is not an enormous project; it is not CPU-cycle critical; something like Python will be fine. * Lack (for the moment) of an active server. Given that network traffic is the most expensive thing, we can possibly get a better solution by having intelligence on both sides of the link. Suppose we want to get just one file from a previous revision... * Poor Windows/Mac support. Even though many developers only work on Linux, this still holds a tool back. The reason is this: at least some projects have some developers on Windows some of the time. Those projects can't switch to Arch. Most people want to only learn one tool deeply, so it won't be Arch. Don't make any overly Unixy assumptions. Avoid too-cute filesystem dependencies. Being in Python should help with portability: people do need to install it, but many developers will already have it and the total burden is possibly less than that of installing C requisite libraries. * Quirky filename support. Files with non-ascii names, or names containing whitespace tend to be handled poorly, perhaps partly because of arch's shell heritage. By swallowing XML we do at least get automatic quoting of wierd strings, and we will always use UTF-8 for internal storage. * Complex file-id-tagging Nobody should be expected to understand this. There are two basic cases: people want to auto-add everything, and want to add by hand. Both can be reasonably accomodated in a simpler system. * Complex naming-convention regexps in ``.arch-inventory`` and ``{arch}/id-tagging-method``. (The fact that there are two overlapping mechanisms with very different names is also bad.) All this complexity basically just comes down to versioned, ignored, unknown, the same as in every other system. So we might as well just have that. There are relatively few cases where regexps help more than globs, and people do find them more complex. Even experienced users can forget to escape ``\.``. We can have a bit of flexibility with (say) zsh-style extended globs like ``*.(pyo|pyc)``. * Some files inside ``{arch}`` are meant to be edited by the user, and some are not. This is a flaw common to other systems, including Bitkeeper. The user should be clear on whether they should touch things in a directory or not. * Source-librarian function works poorly. It is not the place of a tool to force people to stay organized; it should just facilitate it. In any case, a library without descriptive text is of little use. So bazaar-ng does not force three-level naming but rather lets people arrange their own trees, and put on their own descriptions (either within the tree, or by e.g. having a wiki page listing branches, descriptions and URLs.) * Whining about inode mismatches on pristines/revlibs. It's fine that there is validation, but the tool should not show off its limitations. Just do the right thing. * More generally, not quite enough consistency/safety checking. * Unclear what commands work on subdirs and what works on the whole tree. * Hard to share work on a single branch -- though still not really too bad. * Lack of partial commits of added/deleted files. * Separate id tags for each file; simple implementation but probably costs too much disk space. * Way too many deeply-nested directories; should be just one. * ``.listing`` files are ugly and a point of failure. They can cause trouble on some servers which limit access to dot files. Isn't it possible to have the top-level file be predictable and find everything else needed from there? * Summary separate from log message. Simpler to just have one message, and let people extract the first line/sentence if they wish. Rather than 'keywords', let arbitrary properties be attached to the revision at the time of commit. Simpler disconnected operation ------------------------------ A basic distributed VCS operation is to make it easy to work on an offline laptop. Arch can do this in a few ways, but none of them are really simple. http://wiki.gnuarch.org/moin.cgi/mini_5fTravellingOftenWithArch Yaron Minsky writes (2005-01-18): I was wondering what people considered to be a good setup for using Arch on a laptop. Here's the basic situation. I have a few projects that reside in arch repositories on my desktop computer. Basically, I'd like to be able to do commits from my laptop, and have those commits eventually migrate up to the main repository. I understand that the right way of doing this is to set up archives on the laptop. But what's the cleanest way of doing this? And is there some way of making the commits I do on the laptop show up cleanly and individually on the desktop once they are merged in? Tagging-method -------------- baz default is much less strict. Much of tla depends on being able to categorize files. Some hangovers from larch -- eg precious and backup are essentially the same. junk is never deleted today. Automatic version control with 'untagged-source source'. But this is deprecated for baz? Annoyed by - defaults - having the feature at all - complex way to define it Default of 166 lines. Remove id-tagging-method command or at most make it read-only. If people really want to use deprecated methods they can just edit the file. So we can ship a default id-tagging which works the same as CVS/Svn: give warnings for files that are not known to be junk. This is the default in baz right now. Also we have .arch-inventory, which is per-directory. Why not have 'baz ignore FILENAME'? To remove ignores, perhaps you have to edit the .arch-inventory. Print "FILTER added to PATH/.arch-inventory"; create and baz-add this file if it doesn't. Docs should perhaps emphasize .arch-inventory as the basic method and only mention =tagging-method as an advanced topic. Should this really be regexps, or just file globs?