~bzr-pqm/bzr/bzr.dev

« back to all changes in this revision

Viewing changes to doc/design.txt

Committer: Martin Pool
Date: 2006-03-06 11:20:10 UTC
mfrom: (1593 +trunk)
mto: This revision was merged to the branch mainline in revision 1611.
Revision ID: mbp@sourcefrog.net-20060306112010-17c0170dde5d1eea

[merge] large merge to sync with bzr.dev

files added:
bzrlib/bzrdir.py

bzrlib/doc/api/branch.txt

bzrlib/lockdir.py

bzrlib/reconcile.py

bzrlib/sign_my_commits.py

bzrlib/tests/blackbox/test_aliases.py

bzrlib/tests/blackbox/test_bound_branches.py

bzrlib/tests/blackbox/test_break_lock.py

bzrlib/tests/blackbox/test_checkout.py

bzrlib/tests/blackbox/test_commit.py

bzrlib/tests/blackbox/test_conflicts.py

bzrlib/tests/blackbox/test_help.py

bzrlib/tests/blackbox/test_info.py

bzrlib/tests/blackbox/test_log.py

bzrlib/tests/blackbox/test_logformats.py

bzrlib/tests/blackbox/test_re_sign.py

bzrlib/tests/blackbox/test_reconcile.py

bzrlib/tests/blackbox/test_sign_my_commits.py

bzrlib/tests/blackbox/test_update.py

bzrlib/tests/branch_implementations/test_bound_sftp.py

bzrlib/tests/branch_implementations/test_permissions.py

bzrlib/tests/branch_implementations/test_update.py

bzrlib/tests/bzrdir_implementations

bzrlib/tests/bzrdir_implementations/__init__.py

bzrlib/tests/bzrdir_implementations/test_bzrdir.py

bzrlib/tests/interrepository_implementations

bzrlib/tests/interrepository_implementations/__init__.py

bzrlib/tests/interrepository_implementations/test_interrepository.py

bzrlib/tests/repository_implementations

bzrlib/tests/repository_implementations/__init__.py

bzrlib/tests/repository_implementations/test_reconcile.py

bzrlib/tests/repository_implementations/test_repository.py

bzrlib/tests/test_bzrdir.py

bzrlib/tests/test_errors.py

bzrlib/tests/test_lockdir.py

bzrlib/tests/test_reconcile.py

bzrlib/tests/test_repository.py

bzrlib/tests/test_transform.py

bzrlib/tests/workingtree_implementations

bzrlib/tests/workingtree_implementations/__init__.py

bzrlib/tests/workingtree_implementations/test_is_control_filename.py

bzrlib/tests/workingtree_implementations/test_pull.py

bzrlib/tests/workingtree_implementations/test_workingtree.py

bzrlib/transform.py

files removed:
bzrlib/_changeset.py

bzrlib/_merge_core.py

doc/Makefile

doc/adoption.txt

doc/bitkeeper.txt

doc/changelogs.txt

doc/cherry-picking.txt

doc/cmdref.txt

doc/common-format.txt

doc/compared-aegis.txt

doc/compared-codeville.txt

doc/compared-cvsnt.txt

doc/compared-opencm.txt

doc/compared-prcs.txt

doc/compared-teamware.txt

doc/compression.txt

doc/config-specs.txt

doc/conflicts.txt

doc/costs.txt

doc/darcs.txt

doc/deadly-sins.txt

doc/default.css

doc/design.txt

doc/extra-commands.txt

doc/formats.txt

doc/hashes.txt

doc/ignore.txt

doc/index.txt

doc/interrupted.txt

doc/intro.txt

doc/inventory.txt

doc/join-branches.txt

doc/kill-version.txt

doc/layers.txt

doc/library-interface.txt

doc/merge.txt

doc/mirroring.txt

doc/monotone.txt

doc/news.txt

doc/optional-edit.txt

doc/partial-commit.txt

doc/pool.txt

doc/purpose.txt

doc/python.txt

doc/quilt.txt

doc/quotes.txt

doc/random.txt

doc/requirements.txt

doc/revfile-annotation.txt

doc/revfile.txt

doc/revision-syntax.txt

doc/rollup.txt

doc/scalability.txt

doc/security.txt

doc/shared-branches.txt

doc/short-demo.txt

doc/split-join-files.txt

doc/supportability.txt

doc/svk.txt

doc/switch-in-branch.txt

doc/tagging.txt

doc/taxonomy.txt

doc/thanks.txt

doc/todo-from-arch.txt

doc/unchanged.txt

doc/unrelated-merge.txt

doc/usability.txt

doc/use-cases.txt

doc/web-interface.txt

doc/workflow.txt

doc/yaml.txt

notes/inventory-v2-sample.xml

notes/inventory-v2.rnc

notes/new-inventory-sample.xml

notes/performance.txt

notes/revfile.txt

notes/schemas.xml

patches

files renamed:
bzrlib/tests/test_parent.py => bzrlib/tests/branch_implementations/test_parent.py

bzrlib/tests/test_fileid_involved.py => bzrlib/tests/repository_implementations/test_fileid_involved.py

files modified:
BRANCH.TODO

NEWS

bzrlib/__init__.py

bzrlib/add.py

bzrlib/branch.py

bzrlib/builtins.py

bzrlib/check.py

bzrlib/commands.py

bzrlib/commit.py

bzrlib/config.py

bzrlib/conflicts.py

bzrlib/decorators.py

bzrlib/diff.py

bzrlib/errors.py

bzrlib/externalcommand.py

bzrlib/fetch.py

bzrlib/hashcache.py

bzrlib/info.py

bzrlib/inventory.py

bzrlib/lock.py

bzrlib/lockable_files.py

bzrlib/log.py

bzrlib/merge.py

bzrlib/missing.py

bzrlib/msgeditor.py

bzrlib/option.py

bzrlib/osutils.py

bzrlib/patch.py

bzrlib/progress.py

bzrlib/repository.py

bzrlib/revision.py

bzrlib/revisionspec.py

bzrlib/rio.py

bzrlib/status.py

bzrlib/store/__init__.py

bzrlib/store/weave.py

bzrlib/symbol_versioning.py

bzrlib/testament.py

bzrlib/tests/HTTPTestUtil.py

bzrlib/tests/__init__.py

bzrlib/tests/blackbox/__init__.py

bzrlib/tests/blackbox/test_ancestry.py

bzrlib/tests/blackbox/test_diff.py

bzrlib/tests/blackbox/test_outside_wt.py

bzrlib/tests/blackbox/test_pull.py

bzrlib/tests/blackbox/test_revision_info.py

bzrlib/tests/blackbox/test_status.py

bzrlib/tests/blackbox/test_too_much.py

bzrlib/tests/blackbox/test_upgrade.py

bzrlib/tests/blackbox/test_versioning.py

bzrlib/tests/branch_implementations/__init__.py

bzrlib/tests/branch_implementations/test_branch.py

bzrlib/tests/test_bad_files.py

bzrlib/tests/test_branch.py

bzrlib/tests/test_commit.py

bzrlib/tests/test_commit_merge.py

bzrlib/tests/test_config.py

bzrlib/tests/test_fetch.py

bzrlib/tests/test_hashcache.py

bzrlib/tests/test_http.py

bzrlib/tests/test_inv.py

bzrlib/tests/test_lockable_files.py

bzrlib/tests/test_log.py

bzrlib/tests/test_merge.py

bzrlib/tests/test_merge_core.py

bzrlib/tests/test_osutils.py

bzrlib/tests/test_permissions.py

bzrlib/tests/test_revision.py

bzrlib/tests/test_revisionnamespaces.py

bzrlib/tests/test_rio.py

bzrlib/tests/test_selftest.py

bzrlib/tests/test_sftp_transport.py

bzrlib/tests/test_source.py

bzrlib/tests/test_symbol_versioning.py

bzrlib/tests/test_testament.py

bzrlib/tests/test_transport_implementations.py

bzrlib/tests/test_tsort.py

bzrlib/tests/test_ui.py

bzrlib/tests/test_uncommit.py

bzrlib/tests/test_upgrade.py

bzrlib/tests/test_weave.py

bzrlib/tests/test_workingtree.py

bzrlib/transactions.py

bzrlib/transport/__init__.py

bzrlib/transport/http/__init__.py

bzrlib/transport/http/_pycurl.py

bzrlib/transport/http/_urllib.py

bzrlib/transport/local.py

bzrlib/transport/memory.py

bzrlib/transport/sftp.py

bzrlib/tsort.py

bzrlib/ui/__init__.py

bzrlib/uncommit.py

bzrlib/upgrade.py

bzrlib/util/configobj/configobj.py

bzrlib/util/configobj/docs/configobj.txt

bzrlib/util/configobj/docs/validate.txt

bzrlib/util/configobj/validate.py

bzrlib/weave.py

bzrlib/workingtree.py

contrib/pwk

Show diffs side-by-side

added added

removed removed

doc/design.txt

****************

Bazaar-NG design

****************

:Author:

Martin Pool <mbp@sourcefrog.net>

:Date: December 2004, Noosa.

.. sectnum::

.. contents::

Abstract

--------

*Bazaar-NG should be a joy to use.*

What if we started from scratch and tried to take the best features

from darcs, svn, arch, quilt, and bk?

Don't get the sum of all features; rather get the minimum features

that make it a joy to use. Choose simplicity, in both interface and model.

Do not multiply entities

beyond necessity.

*Make it work; make it correct; make it fast* -- Ritchie(?)

Design model

------------

* Unify archives and branches; one archive holds one branch. If you

want to publish multiple branches, just put up multiple directories.

* Explicitly add/remove files only; no names or tagline tagging. If

someone wants to do heuristic detection of renames that's fine, but

it's not in the core model.

Quilt indicates an interesting approach: patches themselves are the

thing we're trying to build. We don't just want a record of what

happened, but we want to build up a good description of the change

that will be implied when it's integrated. This implies that we want

to be able to change history quite a lot before merging upstream; or

at least change our description of what will go up.

Principles

----------

* Unix design philosophy (via Peter Miller), tempered by modern

expectations:

- least unnecessary output

- little dependence on *specific* external tools

- short command lines

- least overlap with cooperating tools

* `Worse is better`__

__ http://www.jwz.org/doc/worse-is-better.html

- *Simplicity: the design must be simple, both in implementation and

interface. It is more important for the implementation to be

simple than the interface. Simplicity is the most important

consideration in a design.*

- *Correctness: the design must be correct in all observable

aspects. It is slightly better to be simple than correct.*

- *Consistency: the design must not be overly inconsistent. Consistency

can be sacrificed for simplicity in some cases, but it is better to

drop those parts of the design that deal with less common

circumstances than to introduce either implementational complexity

or inconsistency.*

- *Completeness: the design must cover as many important situations as

is practical. All reasonably expected cases should be

covered. Completeness can be sacrificed in favor of any other

quality. In fact, completeness must sacrificed whenever implementation

simplicity is jeopardized. Consistency can be sacrificed to achieve

completeness if simplicity is retained; especially worthless is

consistency of interface.*

* Try to get a reasonably tasteful balance between having something

that works out of the box but also has composable parts. Provide

mechanism rather than policy but not to excess.

* Files have ids to let us detect renames without having to walk the

whole path.

If there are conflicts in ids they can in principle be resolved.

There might be a ``merge --by-name`` to allow you to force two trees

100

into agreement on IDs. If the merge sees two files with the same

101

name and text then it should conclude that the files merged.

102

103

It would be nice if there were some way to make repeated imports of

104

the same tree give the same ids, but I don't think there is a safe

105

feasible way. Sometimes files start out the same but really should

106

diverge; boilerplate files are one example.

107

108

* Archives are just directories; if you can read/write the files in

109

them you can do what you need. This works even over http/sftp/etc.

110

Or at least this should work for read-only access; perhaps for

111

writing it is reasonable to require a svn+ssh style server invoked

112

over a socket.

113

114

Of course people should not edit the files in there by hand but in

115

an emergency it should be possible.

116

117

* Storing archives in plain directories means making some special

118

effort to make sure they can be rolled back if the commit is

119

interrupted any time. On truly malicious filesystems (NFS) this may

120

be quite difficult, but at a minimum it should be possible to roll

121

back whatever was uncommitted and get to a reasonable state. It

122

should also be reasonably possible to mirror branches using rsync,

123

which may transfer files in arbitrary order and cannot handle files

124

changing while in flight.

125

126

Recovering from an interrupted commit may require a special ``bzr

127

fix`` command, which should write the results to a new branch to

128

avoid losing anything.

129

130

* Branches carry enough information to recreate any previous state of

131

the branch (including its ancestors).

132

133

This does not necessarily mean holding the complete text of all

134

those patches, but we do store at least a globally unique identifier

135

so that we can retrieve them.

136

137

* Commands should correspond to svn or cvs as much as possible: add,

138

get, copy, commit, diff, status, log, merge.

139

140

* We have all the power of mirroring, but without needing to introduce

141

special concepts or commands. If you want somebody's branch

142

available offline just copy it and keep updating to pull in their

143

changes; if you never make any changes the updates will always

144

succeed.

145

146

* It is useful to be able to easily undo a previous change by

147

committing the opposite. I had previously imagined requiring all

148

patches to be stored in a reversible form but it's enough to just do

149

backwards three-way merges.

150

151

* Patches have globally unique IDs which uniquely identify them.

152

153

* As a general principle we separate identification (which must be

154

globally unique) from naming (which must be meaningful to users).

155

Arch fuses them, which makes the human names long and prevents them

156

ever being reused. Monotone doesn't have human-friendly names.

157

158

* Users are identified by something like an email address;

159

``user@domain``. This need not actually be a working email

160

address; the point is just to piggyback on domain names to get

161

human-readable globally unique names.

162

163

* Everything will be designed from the beginning to be safe and

164

reasonable on Windows and Unix.

165

166

* History is append-only. Patches are recorded along with the time at

167

which they were committed; if time steps backwards then we give a

168

warning (but probably commit anyhow.) This means we can reliably

169

reproduce the state of the branch at any previous point, just by

170

backing out patches until we get back there.

171

172

This is also true at a physical level as much as possible; once a

173

patch is committed we do not overwrite it. This should make it less

174

likely that a failure will corrupt past history. However, we may

175

need some indexes which are updated rather than replaced; they

176

should probably be atomically updated.

177

178

* Storage should be reasonably transparent, as much as possible. (ie

179

don't use SQLite or BDB.) At the same time it should be reasonably

180

efficient on a wide range of systems (ie don't require reiserfs to

181

work well.)

182

183

Programmers who look behind the covers should feel comfortable that

184

their data is safe, and hopefully pleased that the design is

185

elegant.

186

187

* Unrecognized files cause a warning when you try to commit, but you

188

can still commit. (Same behavior as CVS/Subversion; less discipline

189

than Arch.)

190

191

If you wish, you can change this to fail rather than just warn; this

192

can be done as tree policy or as an option (eg ``commit --strict``)

193

194

* Files may be ignored by a glob; this can be applied globally (across

195

the whole tree) or for a particular directory. As a special

196

convenience there is ``bzr ignore``.

197

198

* If branches move location (e.g. to a new host or a different

199

directory), then everyone who uses them needs to know the new URL by

200

some out-of-band method.

201

202

* All operations on a branch or pair of branches can be done entirely

203

with the information stored in those branches. Bazaar-NG never needs to

204

go and look at another branch, so we don't need unique branch names

205

or to remember the location of branches.

206

207

* Store SHA-1 hashes of all patches, also store hashes of the tree

208

state in each revision. (We need some defined way to make a hash of

209

a tree of files; for a start we can just cat them together in order

210

by filename.)

211

212

Hashes are stored in such a way that we can switch hash algorithms

213

later if needed if SHA-1 is insecure.

214

215

* You can also sign the hashes of patches or trees.

216

217

* All branches carry all the patches leading up to their current

218

state, so you can recreate any previous state of that branch,

219

including the branches leading up to it.

220

221

* A branch has an append-only history of patches committed on this

222

branch, and also an append-only history of patches that have been

223

merged in.

224

225

* A commit log message file is present in .bzr-log all the time; you

226

can add notes to it as you go along. Some commands automatically

227

add information to this file, such as when merging or reversing

228

changes.

229

230

The first line of the message is used as the summary.

231

232

* Commands that make changes to the working copy will by default baulk

233

if you have any uncommitted changes. Such commands include ``merge``

234

and ``reverse``. This is done for two reasons: to avoid losing your

235

changes in the case where the merge causes problems, and to try to

236

keep merges relatively pure. You can force it if you wish.

237

238

(*pull* is possibly a special case; perhaps it should set aside

239

local changes, update, and then reapply them/remerge them?)

240

241

* Within a branch, you can refer to commits by their sequence number;

242

it's nice and friendly for the common case of looking at your

243

commits in order.

244

245

* You can generate a changelog any time by looking at only local

246

files. Automatically including a changelog in every commit is

247

redundant and so can be eliminated. Of course if you want to

248

manually maintain a changelog you can do that too.

249

250

* At the very least we should have ``undo`` as a reversible

251

``revert``. It might be even better to have a totally general undo

252

which will undo any operation; this is possible by keeping a journal

253

of all changes.

254

255

* Perhaps eventually move to storing changesets in single text files,

256

containing file diffs and also information on renames, etc. The

257

format should be similar to that of ``tla show-changeset``, but

258

lossless.

259

260

* Pristines are kept in the control directory; pristines are

261

relatively expensive to recreate so we might want to keep more than

262

one.

263

264

(Robert says that keeping pristines under there can cause trouble

265

with people running recursive commands across the source tree, so

266

there should probably be some other way to do it. If pristines are

267

identified by their hash then we can have a revlib without needing

268

unique branch names.)

269

270

* Can probably still have cacherevs for revisions; ideally

271

autogenerated in some sensible way. We know the tree checksum for

272

each revision and can make sure we cached the right thing.

273

274

* Bazaar-NG should ideally combine the best merging features of

275

Bitkeeper and Arch: both cherry-picking and arbitrary merging within

276

a graph. The metaphor of a bazaar or souk is appropriate: many

277

independent agents, exchanging selected patches at will.

278

279

* Code should be structured as a library plus a command-line client;

280

the library could be called from any other client. Therefore

281

communication with the user should go through a layer, the library

282

should not arbitrarily exit() or abort(), etc.

283

284

* Any of these details are open to change. If you disagree, write and

285

say so, sooner rather than later. There will be a day in the future

286

where we commit to compatibility, but that is a while off.

287

288

* Timestamps obviously need to be in UTC to be meaningful on the

289

network. I guess they should be displayed in localtime by default

290

and you can change that by setting $TZ or perhaps some option like

291

``--utc``. It might be cool to also capture the local time as an

292

indicator of what the committer was doing.

293

294

* Should probably have some kind of progress indicator like --showdots

295

that is easy to ignore when run from a program (especially an

296

editor); that probably means avoiding tricks with carriage return.

297

(That might be a problem on Windows too.)

298

299

* What date should be present on restored files? We don't remember

300

the date of individual files, but we could set the date for the

301

entire commit.

302

303

* One important layer is concerned with reproducing a previous

304

revision from a given branch; either the whole thing or just a

305

particular file or subdirectory. This is used in many different

306

places. We can potentially plug in different storage mechanisms

307

that can do this; either a very transparent and simple file-based

308

store as in darcs and arch, or perhaps a more tricky/fast

309

database-based system.

310

311

312

Entities and terminology

313

------------------------

314

315

The name of the project is *Bazaar-NG*; the top-level command is

316

``bzr``.

317

318

Branch

319

''''''

320

321

Development in Bazaar-NG takes places on branches. A branch records

322

the progress of a *tree* through various *revisions* by the

323

accumulation of a series of *patches*.

324

325

We can point to a branch by specifying its *location*. At first this

326

will be just a local directory name but it might grow to allow remote

327

URLs with various schemes.

328

329

Branches have a *name* which is for human convenience only; changesets

330

are permanently labelled with the name of the branch on which they

331

originated. Branch names complement change descriptions by providing

332

a broader context for the purpose of the change. Typically the branch

333

name will be the same as the last component of the directory or path.

334

335

There is no higher-level grouping than branches. (Nothing that

336

corresponds to repositories in CVS or Subversion, or

337

archives/categories/versions in Arch.) Of course it may be a good

338

practice to keep your branches organized into directories for each

339

project, just as you might do with tarballs or cvs working

340

directories.

341

342

Bazaar-NG makes forking branches very easy and common.

343

344

345

346

Revision

347

''''''''

348

349

The tree in a branch at a particular moment, after applying all the

350

patches up to that point.

351

352

353

File id

354

'''''''

355

356

A UUID for a versioned file, assigned by ``bzr add``.

357

358

359

Delta

360

'''''

361

362

A smart diff, containing:

363

364

* unidiff hunks for textual changes

365

366

* for each affected file, the file id and the name of that file before

367

and after the delta (they will be the same if the file was not renamed)

368

369

* in future, possibly other information describing symlinks,

370

permissions, etc

371

372

A delta can be generated by comparing two trees without needing any

373

additional input.

374

375

Although deltas have some diff context that would allow fuzzy

376

application they are (almost?) always exactly applied to the correct

377

predecessor.

378

379

380

Changeset

381

'''''''''

382

383

(also known as a patch)

384

385

A changeset represents a commit to a particular branch; it

386

incorporates a *delta* plus some header information such as the name

387

of the committer, the date of the commit, and the commit message.

388

389

390

Tree

391

''''

392

393

A tree of files and directories. A branch minus the Bazaar-NG control files.

394

395

396

397

398

Syntax

399

------

400

401

Branches

402

''''''''

403

404

Branches are identified by their directory name or URL::

405

406

bzr branch http://kernel.org/bzr/linux/linux-2.6

407

bzr branch ./linux-2.6 ./linux-2.6-mbp-partitions

408

409

Branches have human-specified names used for tracing patches to their

410

origin. By default this is the last component of the directory name.

411

412

413

Revisions

414

'''''''''

415

416

Revisions within a branch may be identified by their sequence number

417

on that branch, or by a tag name::

418

419

bzr branch ./linux-2.6@43 ./linux-2.6-old

420

bzr branch ./linux-2.6@rel6.8.1 ./linux-2.6.8.1

421

422

You may also use the UUID of the patch or by the hash of that

423

revision, though sane humans should never (need to) use these::

424

425

bzr log ./linux-2.6@uuid:6eaa1c41-34b8-4e0e-8819-acb5dfcabb78

426

bzr log ./linux-2.6@hash:4bf00930372cce9716411b266d2e03494f7fe7aa

427

428

Revision ranges are given as two revisions separated by a colon (same

429

as Svn):

430

431

bzr merge ../distcc-doc@4:10

432

433

434

Authors

435

'''''''

436

437

Authors are identified by their email address, taken from ``$EMAIL``

438

or ``$BZR_EMAIL``.

439

440

441

442

443

Tree inventory

444

--------------

445

446

When a revision is committed, Bazaar-NG records an "inventory" which

447

essentially says which version of each file should be assembled into

448

which location in the tree. It also includes the SHA-1 hash and the

449

size of each file.

450

451

452

453

Merging

454

-------

455

456

Merges are carried out in Bazaar-NG by a three-way merge of trees. Users

457

can choose to merge all changes from another branch, or a particular

458

subset of changes. In either case Bazaar-NG chooses an appropriate

459

common base appropriately, although there should perhaps also be an

460

option to specify a different base.

461

462

I have not solved all the merge problems here. I do think that this

463

design preserves as much information as possible about the history of

464

the code and so gives a good foundation for smart merging.

465

The basic merge operation is a 3-way diff: we have three files *BASE*,

466

*OTHER* and *MINE* and want to produce a result. There are many

467

different tools that could be used to resolve this interactively or

468

automatically.

469

470

There are some cases where the best base is not a state that ever

471

occurred on the two branches. One such case is when there are two

472

branches that have both tracked an upstream branch but have never

473

previously synced with each other. In this case we suggest that

474

people manually specify the base::

475

476

bzr merge --base linus-2.6 my-2.6

477

478

Merges most commonly happen on files, but can also occur on metadata.

479

For example we may need to resolve a conflict between file ids to

480

decide what name a file should have, or conversely which id it should

481

have.

482

483

When merging an entire branch, the base is chosen as the last revision

484

in which the trees manifests were identical.

485

486

If merging only selected revisions from a branch (ie cherry picking)

487

then the base is set just before the revisions to be merged.

488

489

A three-way merge operates on three inputs: THIS, OTHER, and a BASE.

490

Any regions which have been changed in only one of THIS and OTHER, or

491

changed the same way in both will be carried across automatically.

492

Regions which differ in all three trees are conflicts and must be

493

manually resolved.

494

495

The merge does not depend upon any states the trees may have

496

passed through in between the revisions that are merged.

497

498

After the merge, the destination tree incorporates all the patches

499

from the branch region that was merged in.

500

501

502

503

Sending patches by email

504

------------------------

505

506

Patches can be sent to someone else by email, just by posting the

507

string representation of the changeset. Could also post the GPG

508

signature.

509

510

The changeset cannot itself contain its uniquely-identifying hash.

511

Therefore I suppose it needs some kind of super-header which says what

512

the patch id is; this can be verified by comparing it to the hash of

513

the actual changeset. This in turn applies that the text must be

514

exactly preserved in email, so possibly we need some kind of

515

quoted-printable or ascii-armoured form.

516

517

Another approach would be to not use the hash as the id, but rather

518

something else which allows us to check the patch is actually what it

519

claims to be. For example giving a GPG key id and a UUID would do

520

just as well, and *would* allow the id to be included within the

521

patch, as would giving an arch-style revision ID, assuming we can

522

either map the userid to a GPG key and/or check against a trusted

523

archive.

524

525

There are two ways to apply such a received patch. Ideally it tells

526

us a revision of our branch from which it was based, probably by

527

specifying the content hash. We can use that as the base, make a

528

branch there, apply the patch perfectly, and then merge that branch

529

back in through a 3-way merge. This gives a clean reconciliation of

530

changes in the patch against any local changes in the branch since the

531

base.

532

533

If we do not have the base for the patch we can try apply it using

534

a similar mechanism to regular patch, which might cause conflicts. Or

535

maybe it is not worth special-casing this; we could just require

536

people to have the right basis to accept a patch.

537

538

539

540

Rewriting history

541

-----------------

542

543

History is generally append-only; once something is committed it

544

cannot be undone. We need this to make several important guarantees

545

about being able to reconstruct previous versions, about patches being

546

consistent, and so on and on.

547

548

However, pragmatically, there are a few cases where people will insist

549

on being able to fudge it. We need to accommodate those as best we

550

can, within the limits of causality. In other words, what is

551

physically and logically possible should not be arbitrarily forbidden

552

by the software (though it might be enormously discouraged).

553

554

The basic transaction is a changeset/patch/commit. There is little

555

value and hellish complexity in introducing meta-changesets or trying

556

to update already-committed changes.

557

558

559

Wrong commit message

560

''''''''''''''''''''

561

562

*Oops, I pressed Save too soon, and the commit message is wrong.* This

563

happens all the time.

564

565

If no other branch has taken that change, there is no harm in fixing

566

the message. Noticing the problem right away is probably a very

567

common case.

568

569

Therefore, you can change the descriptive text (but not any other

570

metadata) of a changeset in your tree. This will not propagate to

571

anyone else who has already accepted the change. Nothing will break,

572

but they'll still see the original (incorrect/incomplete) commit.

573

574

575

Committed confidential information

576

''''''''''''''''''''''''''''''''''

577

578

If you just added a file you didn't mean to add then you can simply

579

commit a second changeset to remove it again. However, sometimes

580

people will accidentally commit sensitive/confidential information,

581

and they need to remove it from the history.

582

583

If anyone else has already taken the changeset we can't prevent them

584

seeing or keeping the information. You need to find them and ask them

585

nicely to remove it as well. Similarly, if you've mirrored your

586

branch elsewhere you need to fix it up by hand. This additional

587

manual work is a feature because it gives you some protection against

588

accidentally destroying the wrong thing.

589

590

A similar but related case is accidentally committing an enormous

591

file; you don't want it to hang around in the archive for ever. (In

592

fact, it would need to be stored twice, once for the original commit

593

and again for a reversible remove changeset.)

594

595

Here is our suggestion for how to fix this: make a second branch from

596

just before the undesired commit, typically by specifying a timestamp.

597

If there are any later commits that need to be preserved, they can be

598

merged in too. Possibly that will cause conflicts if they depended on

599

the removed changeset, and those changes then need to be resolved.

600

601

602

603

604

605

606

History truncation

607

------------------

608

609

(I don't think we should implement this soon, if at all, but people

610

might want to know it's possible.)

611

612

Bazaar-NG relies on each branch being able to recreate any of its

613

predecessor states. This is needed to do really intelligent merging.

614

615

However, you might eventually get sick of keeping all the history

616

around forever. Therefore, we can set a history horizon, ignoring all

617

patches before that point.

618

619

The patches are still recorded as being merged but we do not keep the

620

text of the patches. Perhaps we add them to a special list.

621

622

Merges with a tree that have no history in common since the horizon

623

will be somewhat harder.

624

625

626

A development path

627

------------------

628

629

**See also work-log.txt for what I'm currently doing.**

630

631

* Start by still using Arch changeset format, do-changeset and delta

632

commands, possibly also for merge.

633

634

* Don't do any merges automatically at first but rather just build

635

some trees and let the user run dirdiff or something.

636

637

* Don't handle renames at first.

638

639

* Don't worry about actually being distributed yet; just work between

640

local directories. There are no conceptual problems with accessing

641

remote directories.

642

643

644

Compared to others

645

------------------

646

647

* History cannot be rewritten, aside from a couple of special

648

pragmatic cases.

649

650

* Allows cherry-picking, which is not possible on bk or most others.

651

652

* Allows merges within an arbitrary graph (rather than a line, star or

653

tree), which can be done by bk but not by arch or others.

654

655

* History-sensitive merges allow safe repeated merges and mutual

656

merges between parallel lines.

657

658

* Patches are labelled with the history of branches they traversed to

659

their current location, which is previously unique to Arch.

660

661

* Would aim to be almost as small and simple as Quilt.

662

663

* Does not need archives to be registered.

664

665

* Like darcs and bk, remembers the last archive you pulled from and

666

uses this as the default. Also as a bonus remembers all branches

667

you previously pulled and their name, so that it is as if they were

668

registered.

669

670

* Because patches do not change when they move around (as in Darcs),

671

they can be cryptographically signed.

672

673

* Recognizes that textually non-conflicting merges may not be a

674

correct merge and may not work, and so should not be auto-committed.

675

The developer must have a chance to intervene after the merge and

676

before a commit. (I think Monotone is wrong on this.)

677

678

679

680

681

Best practices

682

--------------

683

684

We recommend that people using Bazaar-NG follow these practices and

685

protocols:

686

687

* Develop independent features in separate branches. It's easier to

688

keep them separate and merge later than to mix things together and

689

then try to separate them. Although cherry picking is possible,

690

it's generally harder than keeping the code separate in the first

691

place.

692

693

* Although you can merge in a graph, it can be easier to understand

694

things if you keep them roughly sorted into a star of downstream and

695

upstream branches.

696

697

* Merge off your laptop/workstation into a personal stable tree at

698

regular changes; this protects against accidentally losing your

699

development branch for any reason.

700

701

* Try to have relatively "pure" merges: a single changeset that merges

702

changes should make only those merges and any edits needed to fix

703

them up.

704

705

* You can use reStructuredText (like this document) for commit

706

messages to allow nicer formatting and automatic detection of URLs,

707

email addreses, lists, etc. Nothing requires this.

708

709

710

711

Mechanics

712

---------

713

714

Patch format

715

''''''''''''

716

717

A patch (i.e. commit to a branch) exists at three levels:

718

719

* the hash of the patch, which is used as its globally-unique name

720

721

* the headers of the patch, including:

722

723

- the human-readable name of the branch to which the changeset was committed

724

725

- free-form comments about the changeset

726

727

- the email address and name of the user who committed the changeset

728

729

- the date when the changeset was committed to the branch

730

731

- the UUIDs of any patches merged by this change

732

733

- the hash of the before and after trees

734

735

- the IDs of any files affected by the change, and their names

736

before and after the change, and their hash before and after the

737

change

738

739

* the actual text of the patch, which may include

740

741

- unidiff hunks

742

743

- xdeltas (in reversible pairs?)

744

745

- complete files for adds/deletes, or for binaries

746

747

At the simplest level a branch knows just the IDs of all of the

748

patches committed to it. More usually it will also have all their

749

logs or all their text.

750

751

Using the IDs, it can retrieve the patches when necessary from a

752

shared or external store. By this means we can have many checkouts,

753

each of which looks like it holds all of its history, without actually

754

using a lot of space. When pulling down a remote branch by default

755

everything will be mirrored, but there might be an option to only get

756

the inventory or only the logs.

757

758

Keeping the relatively small header separate from the text makes it

759

easy to get only the header information from a remote machine. One

760

might also when offline like to see only the logs but not necessarily

761

have the text.

762

763

Only the basic policy (keep everything everywhere) needs to be done in

764

the first release of course.

765

766

The headers need to be stored in some format that allows moderately

767

structured data. Ideally it would be both human readable and

768

accessible from various languages. In the prototype I think I'll use

769

Python data format, but that's probably not good in the long term. It

770

may be better to use XML (tasteless though that is) or perhaps YAML or

771

RFC-2822 style. Python data is probably not secure in the face of

772

untrusted patches.

773

774

The date should probably be shown in ISO form (unoptimal though that

775

is in some ways.)

776

777

778

779

780

781

Unresolved questions and other ideas

782

------------------------------------

783

784

785

Pulling in inexact matches

786

''''''''''''''''''''''''''

787

788

If ``update`` pulls in patches noninteractively onto the history, then

789

there are some issues with patches that do not exactly match. Some

790

consequences:

791

792

* You may pull in a patch which causes your tree to semantically

793

break. This might be avoided by having a test case which is checked

794

before committing.

795

796

* The patch may fuzzily apply; this is OK.

797

798

If we pull in a patch from elsewhere then we will have a signature on

799

the patch but not a signature for the whole cacherev.

800

801

802

803

804

Have pristines/working directory by default?

805

''''''''''''''''''''''''''''''''''''''''''''

806

807

It seems a bit redundant to have two copies of the current version of

808

each file in every repository, even ones in which you'll never edit.

809

Some fixes are possible:

810

811

* don't create working copy files

812

813

* hard link working copies into pristine directory (can detect

814

corruption by having SHA-1 sums for all pristine files)

815

816

I think it's reasonable to have

817

818

819

820

Directory name

821

''''''''''''''

822

823

We have a single metadata directory at the top-level of the tree: ``.bzr``.

824

There is no value in having it non-hidden, because it can't be seen

825

from subdirectories anyhow. Apparently three-letter names after a dot

826

are fine on Windows -- it works for ``.svn``.

827

828

829

File encodings

830

''''''''''''''

831

832

Unicode, line endings, etc. Ignore this for now?

833

834

Case-insensitive file names? Maybe follow Darcs in forbidding files

835

that differ only in case.

836

837

838

Always use 3-way merge

839

''''''''''''''''''''''

840

841

I think using .rej files and fuzzy patches is confusing/unhelpful.

842

843

I would like to use 3-way merges between appropriate coordinates as

844

the fundamental mechanism for all 'merge'-type operations.

845

846

Is there any case where .rej files are more useful? Why would you

847

ever want that? Some people seem to `prefer them`__ in Arch.

848

849

__ http://wiki.gnuarch.org/moin.cgi/Process_20_2a_2erej_20files

850

851

I guess when cherry-picking you might not be able to find an

852

appropriate ancestor for diff3? I think you can; anyhow wiggle can

853

transform rejects into diff3-style conflicts so why not do that?

854

855

Miles says there that he prefers .rej files to conflict markers

856

because they give better results for complex conflicts.

857

858

Perhaps we should just always produce both and let people use whatever

859

they want.

860

861

Another suggestion is the *rej_* tool, which helps fix up simple

862

rejects:

863

864

There are four basic rejects fixable via rej.

865

866

1) missing context at the top or bottom of the hunk

867

2) different context in the middle of the hunk

868

3) slightly different lines removed by the hunk than exist in the file

869

4) Large hunks that might apply if they were broken up into smaller ones.

870

871

.. _rej: ftp://ftp.suse.com/pub/people/mason/rej/

872

873

874

Mirroring

875

'''''''''

876

877

One reason people say they like archives is that all new work in that

878

archive will be automatically mirrored off your laptop, if it's

879

already set up to mirror that archive.

880

881

882

883

Control files out of tree

884

'''''''''''''''''''''''''

885

886

Some people would like to have absolutely no control files in their

887

tree. This is conceptually easy as long as we can find both the

888

control files and working directory when a command is run.

889

890

As a first step, the ``.bzr`` directory can be replaced by a symlink,

891

which will prevent recursive commands looking into it. Another

892

approach is to put all actual source in a subdirectory of the tree, so

893

that you never need to see the directory unless you look above the

894

ceiling.

895

896

If this is not enough, we might ask them to have an environment

897

variable point to the control files, or have a map somewhere

898

associating working directories with their control files.

899

Unfortunately both of those seem likely to come loose and whip around

900

dangerously.

901

902

903

Representation of changesets

904

''''''''''''''''''''''''''''

905

906

Using patches is nice for plain text files. In general we want the

907

old and new names to correspond, but these are only for decoration;

908

the file id determines where the patch really goes.

909

910

* Should they be reversible?

911

912

* How to represent binary diffs?

913

914

* How to represent adds/removes?

915

916

* How to zip up multiple changes into a single bundle?

917

918

Reversibility is very important. We do not need it for regular

919

merges, since we can always recover the previous state. We do need it

920

for application of isolated patches, since we may not be able to

921

recover the prior state. It might also help when building a previous

922

tree state.

923

924

Of course we can have an option to show deletes or to make the diff

925

reversible even if it normally is not.

926

927

It is very nice that plain diffs can be concatenated into a single

928

text file. This is not easily possible with binary files, xdeltas,

929

etc. Of course it is uncommon to display binary deltas directly or

930

mail them, but if mailing is really required we could use base64 or

931

MIME.

932

933

Perhaps it would be reasonable to just store xdeltas between versions.

934

935

Perhaps each changeset body should be a tar or zip holding the

936

patches, though in simpler form than Arch.

937

938

(Since these are free choices, perhaps stick closely to what Arch

939

does?)

940

941

942

Continuations

943

'''''''''''''

944

945

Do we need the generalized continuations currently present in Arch, or

946

will a more restricted type do?

947

948

One use case for arch continuation tags is to make a release branch

949

which contains only tags from the development branch.

950

951

Maybe want darcs-style tags which just label the tree at various

952

points; more familiar to users perhaps?

953

954

955

956

bzr fork http://samba.org/bzr/samba/main ./my-samba

957

958

1. creates directory my-samba

959

2. copies contents of samba main branch

960

3. the parent becomes samba-main

961

4. parent is the default place you'll pull from & push to

962

963

Is there a difference between "contains stuff from samba-main" and "is

964

branched from samba-main"?

965

966

967

968

File split/merge

969

''''''''''''''''

970

971

Is there any sense in having a command to copy a file, or to rejoin

972

several files with different IDs?

973

974

Joining might be useful when the same tree is imported several times,

975

or the same new-file operation is done in different trees.

976

977

978

979

Time skew

980

'''''''''

981

982

Local clocks can be wrong when they record a commit. This means that

983

changes may be irrevocably recorded with the wrong time, and that in

984

turn means that later changes may seem to come from before earlier

985

changes. We can give a warning at the later time, but short of

986

refusing the commit there is not much we can do about it.

987

988

989

Annotate/blame/praise

990

---------------------

991

992

``cvs annotate`` is pretty useful for understanding the history of

993

development. At the same time it is not quite trivial to implement,

994

so I plan to make sure all the necessary data is easily accessible and

995

then defer actually writing it. Possibly the most complicated part is

996

something to read in a diff and find which lines came from where.

997

998

What we need is a way to easily follow back through the history of a

999

file, this is easily done by walking back along the branch. Since we

1000

have revision numbers within a branch we have a short label which can

1001

be put against each line; we can also put a key at the bottom with

1002

some fields from each revision showing the committer and comment.

1003

1004

For the case of merge commits people might be interested to know which

1005

merged patch brought in a change. We cannot do this completely

1006

accurately since we don't know what the person did during the manual

1007

resolution of the merge, but by looking for identical lines we can

1008

probably get very close. We can at the very least tell people the

1009

hash of all patches that were merged in so they can go and have a look

1010

at them.

1011

1012

1013

1014

Performance

1015

-----------

1016

1017

I think nothing here requires loading the whole tree into memory, as

1018

Darcs does. We can detect renames and then diff files one by one.

1019

1020

Because patches cannot change or be removed once they are committed or

1021

merged, we do not need to diff the patch-log, which is a problem in

1022

Arch.

1023

1024

We do need to hold the whole list of patches in memory at various

1025

points but that should be at most perhaps 100,000 commits.

1026

1027

We do need to pull down all patches since forever but that's not too

1028

unreasonable.

1029

1030

Most heavy lifting can be done by GNU diff, patch and diff3, which are

1031

hopefully fast.

1032

1033

Patches should be reasonably proportionate to the actual size of

1034

changes, not to the total size of the tree -- we should only list the

1035

hash and id for files that were touched by the change. This implies

1036

that generating the manifest for any given revision means walking from

1037

the start of history to that revision. Of course we can cache that

1038

manifest without necessarily caching the whole revision.

1039

1040

* The dominant effect on performance in many cases will be network

1041

round-trips; as Tom says "every one is like punching your user in

1042

the face."

1043

1044

The network protocol can/should try to avoid them.

1045

1046

However, here's an even lazier idea: by making it possible to use

1047

rsync for moving trees around, we get an insanely pipelined protocol

1048

*for free*.

1049

1050

It's not always suitable (as when committing to a central tree), but

1051

it will often work. Cool!

1052

1053

Safely using rsync probably requires user intervention to make sure

1054

that the tree is idle at the time the command runs; otherwise the

1055

ordering of files arriving makes it really hard to know that we have

1056

a consistent state. I guess we can just ignore patches that are

1057

missing...

1058

1059

1060

Hashing

1061

-------

1062

1063

It might be nice to present hashes in BubbleBabble or some similar

1064

form to make it a bit easier on humans who have to see them. This can

1065

of course be translated to and from binary. On the other hand there

1066

is something in favour of regular strings that can be easily verified

1067

with other tools.

1068

1069

We can have a Henson Mode in which it never trusts that files with the

1070

same hash are identical but always checks it. Of course if SHA-1 is

1071

broken then GPG will probably be broken too...

1072

1073

Comparison:

1074

1075

binary:

1076

20 bytes

1077

bubblebabble

1078

> xizif-segim-vipyz-dyzak-gatas-sifet-dynir-gegon-borad-cetit-tixux

1079

65 bytes

1080

base64:

1081

> qvTGHdzF6KLavt4PO0gs2a6pQ00=

1082

28 bytes

1083

hex:

1084

> aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d

1085

40 bytes

1086

1087

Hex is probably the most reasonable tradeoff.

1088

1089

1090

File metadata

1091

-------------

1092

1093

I don't want to get into general versioning of file metadata like

1094

permissions, at least in the first version; it's hard to say what

1095

should be propagated and what should not be. This is a source code

1096

control system.

1097

1098

It may be useful to carry some very restricted bits, like *read only*

1099

or *executable*; I think these are harmless.

1100

1101

The only case where people generally want to remember permissions and

1102

ownership is when versioning ``/etc``, which is quite a special case.

1103

Perhaps this should be deferred to a special script such as the

1104

``cvs-conf`` package.

1105

1106

1107

Faster comparisons

1108

------------------

1109

1110

There are many cases where we need to compare trees; perhaps the most

1111

common is just diffing the tree to see what changed. For small to

1112

medium trees it is OK to just diff everything in the tree, and we can

1113

do just this in the first version. This runs into trouble for

1114

kernel-sized trees, where reading every

1115

1116

1117

Fear of forking

1118

---------------

1119

1120

There is some fear that distributed version control (many branches)

1121

will encourage projects to fork. I don't think this is necessarily

1122

true of Bazaar.

1123

1124

A fundamental principle of Bazaar is that is not the tool's place to

1125

make you run a project a particular way. The tool enables you to do

1126

what you want. The documentation and community might suggest some

1127

practices that have been useful for other projects, but the choice is

1128

up to you. There are principles for running open source projects that

1129

are useful regardless of tool, and Bazaar supports them. They include

1130

encouraging new contributors, building community, managing a good

1131

release schedule and so on, but I won't enumerate them all here (and I

1132

don't claim to know them all.)

1133

1134

Bazaar reduces some pressures that can lead to forking. There need

1135

not be fights about who gets commit access: everyone can have a branch

1136

and they can contribute their changes. Radical new development can

1137

occur on one branch while stabilization occurs on another and a new

1138

feature or port on a third. Both creating the branches and merging

1139

between them should be easier in the Bazaar than with existing

1140

systems. (Though of course there may be technical difficulties that

1141

no tool can totally remove.)

1142

1143

Sometimes there really is a time for a fork, for various reasons:

1144

irreconcilable differences on technical direction or personality. If

1145

that happens, Bazaar makes the break less total: the projects can

1146

still merge patches, share bug fixes and features, and even eventually

1147

reunite.

1148

1149

1150

Why a new project?

1151

------------------

1152

1153

A key goal is simplicity and user-friendliness; this is easier to

1154

build into a new tool than to fix in an existing tool. Nevertheless

1155

we want to provide a smooth upgrade path from Arch, CVS, and other

1156

systems.

1157

1158

1159

References

1160

----------

1161

1162

* http://www.dwheeler.com/essays/scm.html

1163

1164

Good analysis; should try to address everything there in a way he will like.

1165

1166

1167

.. Local variables:

1168

.. mode: indented-text

1169

.. End:

1170

1171

.. Would like to use rst-mode, but it's too slow on a document of this

1172

.. size.

b'\\ No newline at end of file'

Older »