260
267
sooner you can stop evaluating the 99% that you don't care about, the
261
268
less work you do.
271
Avoiding work: avoiding reading parent data
272
-------------------------------------------
274
We would like to avoid the work of reading any data about the parent
275
revisions. We should at least try to avoid reading anything from the
276
repository; we can also consider whether it is possible or useful to hold
277
less parent information in the working tree.
279
When a commit of selected files is requested, the committed snapshot is a
280
composite of some directories from the parent revision and some from the
281
working tree. In this case it is logically necessary to have the parent
282
inventory information.
284
If file last-change information or per-file graph information is stored
285
then it must be available from the parent trees.
287
If the Branch's storage method does delta compression at commit time it
288
may need to retrieve file or inventory texts from the repository.
290
It is desirable to avoid roundtrips to the Repository during commit,
291
particularly because it may be remote. If the WorkingTree can determine
292
by itself that a text was in the parent and therefore should be in the
293
Repository that avoids one roundtrip per file.
295
There is a possibility here that the parent revision is not stored, or not
296
correctly stored, in the repository the tree is being committed into, and
297
so the committed tree would not be reconstructable. We could check that
298
the parent revision is present in the inventory and rely on the invariant
299
that if a revision is present, everything to reconstruct it will be
306
Caller starts a commit
308
>>> Branch.commit(from_tree, options)
310
This creates a CommitBuilder object matched to the Branch, Repository and
311
Tree. It can vary depending on model differences or by knowledge of what
312
is efficient with the Repository and Tree. Model differences might
313
include whether no-text-change merges need to be reported, and whether the
315
The basic CommitBuilder.commit structure can be
317
1. Ask the branch if it is ready to commit (up to date with master if
320
2. Ask the tree if it is ready to commit to the branch (up to date with
321
branch?), no conflicts, etc
323
3. Commit changed files; prototype implementation:
325
a. Ask the working tree for all committable files; for each it should
326
return the per-file parents, stat information, kind, etc.
328
b. Ask the repository to store the new file text; the repository should
329
return the stored sha1 and new revision id.
331
4. Commit changed inventory
333
5. Commit revision object
343
Complications of commit
344
-----------------------
346
Bazaar (as of 0.17) does not support selective-file commit of a merge;
347
this could be done if we decide how it should be recorded - is this to be
348
stored as an overall merge revision; as a preliminary non-merge revisions;
349
or will the per-file graph diverge from the revision graph.
351
There are several checks that may cause the commit to be refused, which
352
may be activated or deactivated by options.
354
* presence of conflicts in the tree
356
* presence of unknown files
358
* the working tree basis is up to date with the branch tip
360
* the local branch is up to date with the master branch, if there
361
is one and --local is not specified
363
* an empty commit message is given,
365
* a hook flags an error
367
* a "pointless" commit, with no inventory changes
369
Most of these require walking the tree and can be easily done while
370
recording the tree shape. This does require that it be possible to abort
371
the commit after the tree changes have been recorded. It could be ok to
372
either leave the unreachable partly-committed records in the repository,
377
* when automatically adding new files or deleting missing files during
378
commit, they must be noted during commit and written into the working
381
* refuse "pointless" commits with no file changes - should be easy by
382
just refusing to do the final step of storing a new overall inventory
385
* heuristic detection of renames between add and delete (out of scope for
388
* pushing changes to a master branch if any
390
* running hooks, pre and post commit
392
* prompting for a commit message if necessary, including a list of the
393
changes that have already been observed
395
* if there are tree references and recursing into them is enabled, then
398
Commit needs to protect against duplicated file ids
401
Updates that need to be made in the working tree, either on conclusion
402
of commit or during the scan, include
404
* Changes made to the tree shape, including automatic adds, renames or
407
* For trees (eg dirstate) that cache parent inventories, the old parent
408
information must be removed and the new one inserted
410
* The tree hashcache information should be updated to reflect the stat
411
value at which the file was the same as the committed version, and the
412
content hash it was observed to have. This needs to be done carefully to
413
prevent inconsistencies if the file is modified during or shortly after
414
the commit. Perhaps it would work to read the mtime of the file before we
415
read its text to commit.
421
The commit api is invoked by the command interface, and copies information
422
from the tree into the branch and its repository, possibly updating the
423
WorkingTree afterwards.
425
The command interface passes:
427
* a commit message (from an option, if any),
428
* or an indication that it should be read interactively from the ui object;
429
* a list of files to commit
430
* an option for a dry-run commit
431
* verbose option, or callback to indicate
432
* timestamp, timezone, committer, chosen revision id
434
* option for local-only commit on a bound branch
435
* option for strict commits (fail if there are unknown or missing files)
436
* option to allow "pointless" commits (with no tree changes)
438
(This is rather a lot of options to pass individually and just for code tidyness maybe some of them should be combine into objects.)
440
>>> Branch.commit(from_tree, message, files_to_commit, ...)
442
There will be different implementations of this for different Branch
443
classes, whether for foreign branches or Bazaar repositories using
444
different storage methods.
446
Most of the commit should occur during a single lockstep iteration across
447
the workingtree and parent trees. The WorkingTree interface needs to
448
provide methods that give commit all it needs. Some of these methods
449
(such as answering the file's last change revision) may be deprecated in
450
newer working trees and there we have a choice of either calculating the
451
value from the data that is present, or refusing to support commit to
454
For a dirstate tree the iteration of changes from the parent can easily be
455
done within its own iter_changes.
457
Dirstate inventories may be most easily updated in a single operation at
458
the end; however it may be best to accumulate data as we proceed through
459
the tree rather than revisiting it at the end.
461
Showing a progress bar for commit may not be necessary if we report files
462
as they are committed. Alternatively we could transiently show a progress
463
bar for each directory that's scanned, even if no changes are observed.
465
This needs to collect a list of added/changed/removed files, each of which
466
must have its text stored (if any) and containing directory updated. This
467
can be done by calling Tree._iter_changes on the source tree, asking for
470
In the 0.17 model the commit operation needs to know the per-file parents
471
and per-file last-changed revision.
473
(In this and other operations we must avoid having multiple layers walk
474
over the tree separately. For example, it is no good to have the Command
475
layer walk the tree to generate a list of all file ids to commit, because
476
the tree will also be walked later. The layers that do need to operate
477
per-file should probably be bound together in a per-dirblock iterator,
478
rather than each iterating independently.)
480
Branch->Tree interface
481
----------------------
483
The Branch commit code needs to ask the Tree what should be committed, in
484
terms of changes from the parent revisions. If the Tree holds all the
485
necessary parent tree information itself it can do it single handed;
486
otherwise it may need to ask the Repository for parent information.
488
This should be a streaming interface, probably like iter_changes returning
489
information per directory block.
491
The interface should not return a block for directories that are
492
recursively unchanged.
494
The tree's idea of what is possibly changed may be more conservative than
495
that of the branch. For example the tree may report on merges of files
496
where the text is identical to the parents: this must be recorded for
497
Bazaar branches that record per-file ancestry but is not necessary for all
498
branches. If the tree is responsible for determining when directories
499
have been recursively modified then it will report on all the parents of
500
such files. There are several implementation options:
502
1. Return all files and directories the branch might want to commit, even
503
if the branch ends up taking no action on them.
505
2. When starting the iteration, the branch can specify what type of change
506
is considered interesting.
508
Since these types of changes are probably (??) rare compared to files that
509
are either completely unmodified or substantially modified, the first may
510
be the best and simplest option.
512
The branch needs to build an inventory to commit, which must include
513
unchanged files within changed directories. This should be returned from
514
the working tree too. Repositories that store per-directory inventories
515
will want to build and store these from the lowest directories up.
516
For 0.17 format repositories with an all-in-one inventory it may be
517
easiest to accumulate inventory entries in arbitrary order into an
518
in-memory Inventory and then serialize it.
520
It ought to be possible to commit any Tree into a Branch, without
521
requiring a WorkingTree; the commit code should cope if the tree is not
522
interested in updating hashcache information or does not have a
526
Information from the tree to repository
527
---------------------------------------
529
The main things the tree needs to tell the Branch about are:
531
* A file is modified from its parent revision (in text, permissions,
532
other), and so its text may need to be stored.
534
Files should also be reported if they have more than one unique parent
535
revision, for repositories that store per-file graphs or last-change
536
revisions. Perhaps this behaviour should be optional.
538
**XXX:** are renames/deletions reported here too?
540
* The complete contents of a modified directory, so that its inventory
541
text may be stored. This should be done after all the contained files
542
and directories have been reported. If there are unmodified files,
543
or unselected files carried through from
545
XXX: Actually perhaps not grouped by directory, but rather grouped
546
appropriately for the shape of inventory storage in the repository.
548
In a zoomed-in checkout the workingtree may not have all the shape data
551
* A file is missing -- could cause either automatic removal or an aborted
554
* Any unknown files -- can cause automatic addition, abortion of a strict
555
commit, or just reporting.
558
Information from the repository to the tree
559
-------------------------------------------
561
After the commit the tree needs to be updated to the new revision. Some
562
information which was accumulated during the commit must be made available
563
to the workingtree. It's probably reasonable to hold it all in memory and
564
allow the workingtree to get it in whatever order it wants.
566
* A list of modified entries, and for each one:
568
* The stat values observed when the file was first read.
570
* The hash of the committed file text.
572
* The file's last-change revision, if appropriate.
574
This should include any entries automatically added or removed.
576
This might be construed as an enhanced version of ``set_parent_trees``.
577
We can avoid a stat on each file by using the value that was observed when
585
For a partial commit the directory contents may need to contain a mix of
586
entries from the working tree and parent trees. This code probably
587
shouldn't live in a specific tree implementation; maybe there should be a
588
general filter that selects paths from one tree into another?
590
However, the tree walking code does probably need to know about selected
591
paths to avoid examining unselected files or directories.
593
We never refuse selective file commits (except of merges).
600
What is common to all commit implementations, regardless of workingtree or
603
* Prompting for a commit message?
604
* Strictness/conflict checks?
607
How should this be separated?
614
For current and contemplated Bazaar storage formats, we can only finally
615
commit a directory after its contained files and directories have been
618
The dirstate workingtree format naturally iterates by directory in order
619
by path, yielding directories before their contents. This may also be the
620
most efficient order in which to stat and read the files.
622
One option would be to construe the interface as a visitor which reports
623
when files are detected to be changed, and also when directories are
627
Open question: per-file graphs
628
------------------------------
630
**XXX:** If we want to retain explicitly stored per-file graphs, it would
631
seem that we do need to record per-file parents. We have not yet finally
632
settled that we do want to remove them or treat them as a cache. This api
633
stack is still ok whether we do or not, but the internals of it may