~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/compared-codeville.txt

Committer: mbp at sourcefrog
Date: 2005-04-04 09:50:24 UTC
Revision ID: mbp@sourcefrog.net-20050404095024-4646dbcc42eada9e

workaround for python2.3 difflib bug

files added:
.bzrignore

NEWS

README

build-api

bzrlib

bzrlib/__init__.py

bzrlib/add.py

bzrlib/branch.py

bzrlib/check.py

bzrlib/commands.py

bzrlib/diff.py

bzrlib/errors.py

bzrlib/info.py

bzrlib/inventory.py

bzrlib/newinventory.py

bzrlib/osutils.py

bzrlib/revision.py

bzrlib/store.py

bzrlib/tests.py

bzrlib/textui.py

bzrlib/trace.py

bzrlib/tree.py

bzrlib/xml.py

doc/Makefile

doc/adoption.txt

doc/bitkeeper.txt

doc/changelogs.txt

doc/cherry-picking.txt

doc/cmdref.txt

doc/common-format.txt

doc/compared-aegis.txt

doc/compared-codeville.txt

doc/compared-cvsnt.txt

doc/compared-opencm.txt

doc/compared-prcs.txt

doc/compared-teamware.txt

doc/compression.txt

doc/config-specs.txt

doc/conflicts.txt

doc/costs.txt

doc/darcs.txt

doc/deadly-sins.txt

doc/default.css

doc/design.txt

doc/extra-commands.txt

doc/faq.txt

doc/formats.txt

doc/hashes.txt

doc/ignore.txt

doc/index.txt

doc/interrupted.txt

doc/intro.txt

doc/inventory.txt

doc/join-branches.txt

doc/kill-version.txt

doc/layers.txt

doc/library-interface.txt

doc/merge.txt

doc/mirroring.txt

doc/monotone.txt

doc/news.txt

doc/optional-edit.txt

doc/partial-commit.txt

doc/pool.txt

doc/purpose.txt

doc/python.txt

doc/quickref.txt

doc/quilt.txt

doc/random.txt

doc/requirements.txt

doc/revision-syntax.txt

doc/roadmap.txt

doc/rollup.txt

doc/scalability.txt

doc/security.txt

doc/shared-branches.txt

doc/short-demo.txt

doc/supportability.txt

doc/svk.txt

doc/tagging.txt

doc/taxonomy.txt

doc/testing.txt

doc/thanks.txt

doc/todo-from-arch.txt

doc/unchanged.txt

doc/unrelated-merge.txt

doc/usability.txt

doc/use-cases.txt

doc/web-interface.txt

doc/work-order.txt

doc/workflow.txt

doc/yaml.txt

elementtree

elementtree/ElementTree.py

elementtree/__init__.py

notes

notes/new-inventory-sample.xml

notes/performance.txt

setup.py

files removed:
.bzrignore

knit.py

testknit.py

testsweet.py

woolyweave.py

Show diffs side-by-side

added added

removed removed

doc/compared-codeville.txt

Codeville

*********

Documentation on how this actually works is pretty scarce to say the

least.

I *think* I understand their merge algorithm though, and it's pretty

clever. Basically we do a two-way merge between annotated forms of

the two files: that is, with each line marked with the revision in

which it last changed. (I am simplifying here by speaking of lines

and changes, but I don't think it causes any essential problem.)

Now we walk through each file, line by line. If the change that

introduced the line state in branch A is already merged into branch B,

then we can just take B.

Now is this actually better?

It may be better in several ways:

* Do not need to choose just a single ancestor, but rather can

take advantage of all possible previous changes.

* Can handle OTHER containing changes which have been merged into

THIS, but have then been overwritten.

* Can handle cherrypicks(!) by remembering which lines came in from

that cherrypick; then we don't need to merge them again.

Some questions:

* Do we actually need to store the annotations, or can we just infer

it at the time we do the merge?

* Can this be accomodated in something like an SCCS weave format? I

think something like a weave may work, in as much as it is basically

a concatenation of annotations, but I don't know if it represents

merges well enough to cope.

Can this handle binaries or type-specific merges, and if so how?

Unmergeable binaries are easy: just get the user to pick one. Things

like XML are harder; we probably need to punt out to a type-specific

three-way merge. Of course this approach does not forbid also

offering a 3-way merge.

----

I suppose this could be accomodated by an annotation cache on top of

plain history storage, or by using a storage format such as a weave

that can efficiently produce annotation information.

That is to say there is nothing inherently necessary about remembering

the line history at the point when it is committed, except that it

might be more efficient to do this once and remember it than to

----

There is another interesting approach that can be used even in a tool

that does not inherently remember annotations:

Given two files to merge, find all regions of difference. For each

such, try to find a common ancestor having the same content for the

region. Subdivide the region if necessary.

This naive approach is probably infeasible, since it would mean

checking every possible predecessor.

----

Rather than storing or calculating annotations, we could try using a

complex weave, which allows one file version to be represented as a

weave of multiple disjoint previous versions. It sounds complex but

it might work.

Essentially we store each file as a selection of lines that should be

turned on in that file. These files might come from any of the

predecessors that were merged into that file. Complex to get right

but it might work.

This is written in terms of lines, but it might make more sense to

just use byte ranges: perhaps more efficient when handling long files,

and makes binaries less of a special case.

codeville in fact does *not* seem to do this, though to me it seems

like a fairly natural corollary of their design.

This seems to imply holding the file text and ancestry of every branch

that ever merged into this one, rather than just finding them if we

later want them. Hm. That is nice in terms of doing smart merges.

That possibly causes trouble in terms of having a name for these

branches floating around inside our space, and making sure we don't

clash with them. It may make sense in terms of having a working

directory be just a view into a shared database, looking at a

particular line of development.

Indeed the main difficulty seems to be of naming branches in this

space. Perhaps we should move back to using repositories and named

branches within them, but not rely on branch names being unique out of

the context of a single repository.

100

101

Wow, this seems to open a big can of worms.

102

103

----

104

105

So the conclusion is that this is very cool, but it does not require a

106

fundamental change of model and can be implemented later.

Older »