2511.1.4
by Ian Clatworthy
updated NEWS and added commit performance notes to doc/developers |
1 |
Commit Performance Notes |
2 |
======================== |
|
3 |
||
2541.2.1
by Robert Collins
Document contributing to the performance drive somewhat. |
4 |
.. contents:: :local: |
2511.1.4
by Ian Clatworthy
updated NEWS and added commit performance notes to doc/developers |
5 |
|
2513.1.5
by Martin Pool
More analysis of commit |
6 |
Changes to commit |
7 |
----------------- |
|
8 |
||
9 |
We want to improve the commit code in two phases. |
|
10 |
||
11 |
Phase one is to have a better separation from the format-specific logic, |
|
12 |
the user interface, and the general process of committing. |
|
13 |
||
14 |
Phase two is to have better interfaces by which a good workingtree format |
|
15 |
can efficiently pass data to a good storage format. If we get phase one |
|
16 |
right, it will be relatively easy and non-disruptive to bring this in. |
|
17 |
||
18 |
||
2511.1.4
by Ian Clatworthy
updated NEWS and added commit performance notes to doc/developers |
19 |
Commit: The Minimum Work Required |
20 |
--------------------------------- |
|
21 |
||
22 |
Here is a description of the minimum work that commit must do. We |
|
23 |
want to make sure that our design doesn't cost too much more than this |
|
24 |
minimum. I am trying to do this without making too many assumptions |
|
25 |
about the underlying storage, but am assuming that the ui and basic |
|
26 |
architecture (wt, branch, repo) stays about the same. |
|
27 |
||
28 |
The basic purpose of commit is to: |
|
29 |
||
30 |
1. create and store a new revision based on the contents of the working tree |
|
31 |
2. make this the new basis revision for the working tree |
|
32 |
||
33 |
We can do a selected commit of only some files or subtrees. |
|
34 |
||
35 |
The best performance we could hope for is: |
|
36 |
- stat each versioned selected working file once |
|
37 |
- read from the workingtree and write into the repository any new file texts |
|
38 |
- in general, do work proportional to the size of the shape (eg |
|
39 |
inventory) of the old and new selected trees, and to the total size of |
|
40 |
the modified files |
|
41 |
||
42 |
In more detail: |
|
43 |
||
44 |
1.0 - Store new file texts: if a versioned file contains a new text |
|
45 |
there is no avoiding storing it. To determine which ones have changed |
|
46 |
we must go over the workingtree and at least stat each file. If the |
|
47 |
file is modified since it was last hashed, it must be read in. |
|
48 |
Ideally we would read it only once, and either notice that it has not |
|
49 |
changed, or store it at that point. |
|
50 |
||
51 |
On the other hand we want new code to be able to handle files that are |
|
52 |
larger than will fit in memory. We may then need to read each file up |
|
53 |
to two times: once to determine if there is a new text and calculate |
|
54 |
its hash, and again to store it. |
|
55 |
||
56 |
1.1 - Store a tree-shape description (ie inventory or similar.) This |
|
57 |
describes the non-file objects, and provides a reference from the |
|
58 |
Revision to the texts within it. |
|
59 |
||
60 |
1.2 - Generate and store a new revision object. |
|
61 |
||
62 |
1.3 - Do delta-compression on the stored objects. (git notably does |
|
63 |
not do this at commit time, deferring this entirely until later.) |
|
64 |
This requires finding the appropriate basis for each modified file: in |
|
65 |
the current scheme we get the file id, last-revision from the |
|
66 |
dirstate, look into the knit for that text, extract that text in |
|
67 |
total, generate a delta, then store that into the knit. Most delta |
|
2513.1.2
by Martin Pool
Remove duplicated commit use case documentation |
68 |
operations are O(n**2) to O(n**3) in the size of the modified files. |
2511.1.4
by Ian Clatworthy
updated NEWS and added commit performance notes to doc/developers |
69 |
|
70 |
1.4 - Cache annotation information for the changes: at the moment this |
|
71 |
is done as part of the delta storage. There are some flaws in that |
|
72 |
approach, such as that it is not updated when ghosts are filled, and |
|
73 |
the annotation can't be re-run with new diff parameters. |
|
74 |
||
75 |
2.1 - Make the new revision the basis for the tree, and clear the list |
|
76 |
of parents. Strictly this is all that's logically necessary, unless |
|
77 |
the working tree format requires more work. |
|
78 |
||
79 |
The dirstate format does require more work, because it caches the |
|
80 |
parent tree data for each file within the working tree data. In |
|
81 |
practice this means that every commit rewrites the entire dirstate |
|
82 |
file - we could try to avoid rewriting the whole file but this may be |
|
83 |
difficult because variable-length data (the last-changed revision id) |
|
84 |
is inserted into many rows. |
|
85 |
||
86 |
The current dirstate design then seems to mean that any commit of a |
|
87 |
single file imposes a cost proportional to the size of the current |
|
88 |
workingtree. Maybe there are other benefits that outweigh this. |
|
89 |
Alternatively if it was fast enough for operations to always look at |
|
90 |
the original storage of the parent trees we could do without the |
|
91 |
cache. |
|
92 |
||
93 |
2.2 - Record the observed file hashes into the workingtree control |
|
94 |
files. For the files that we just committed, we have the information |
|
95 |
to store a valid hash cache entry: we know their stat information and |
|
96 |
the sha1 of the file contents. This is not strictly necessary to the |
|
97 |
speed of commit, but it will be useful later in avoiding reading those |
|
98 |
files, and the only cost of doing it now is writing it out. |
|
99 |
||
100 |
In fact there are some user interface niceties that complicate this: |
|
101 |
||
102 |
3 - Before starting the commit proper, we prompt for a commit message |
|
103 |
and in that commit message editor we show a list of the files that |
|
104 |
will be committed: basically the output of bzr status. This is |
|
105 |
basically the same as the list of changes we detect while storing the |
|
106 |
commit, but because the user will sometimes change the tree after |
|
107 |
opening the commit editor and expect the final state to be committed I |
|
108 |
think we do have to look for changes twice. Since it takes the user a |
|
109 |
while to enter a message this is not a big problem as long as both the |
|
110 |
status summary and the commit are individually fast. |
|
111 |
||
112 |
4 - As the commit proceeds (or after?) we show another status-like |
|
113 |
summary. Just printing the names of modified files as they're stored |
|
114 |
would be easy. Recording deleted and renamed files or directories is |
|
115 |
more work: this can only be done by reference to the primary parent |
|
116 |
tree and requires it be read in. Worse, reporting renames requires |
|
117 |
searching by id across the entire parent tree. Possibly full |
|
118 |
reporting should be a default-off verbose option because it does |
|
119 |
require more work beyond the commit itself. |
|
120 |
||
121 |
5 - Bazaar currently allows for missing files to be automatically |
|
122 |
marked as removed at the time of commit. Leaving aside the ui |
|
123 |
consequences, this means that we have to update the working inventory |
|
124 |
to mark these files as removed. Since as discussed above we always |
|
125 |
have to rewrite the dirstate on commit this is not substantial, though |
|
126 |
we should make sure we do this in one pass, not two. I have |
|
127 |
previously proposed to make this behaviour a non-default option. |
|
128 |
||
129 |
We may need to run hooks or generate signatures during commit, but |
|
130 |
they don't seem to have substantial performance consequences. |
|
131 |
||
132 |
If one wanted to optimize solely for the speed of commit I think |
|
133 |
hash-addressed file-per-text storage like in git (or bzr 0.1) is very |
|
134 |
good. Remarkably, it does not need to read the inventory for the |
|
135 |
previous revision. For each versioned file, we just need to get its |
|
136 |
hash, either by reading the file or validating its stat data. If that |
|
137 |
hash is not already in the repository, the file is just copied in and |
|
138 |
compressed. As directories are traversed, they're turned into texts |
|
139 |
and stored as well, and then finally the revision is too. This does |
|
140 |
depend on later doing some delta compression of these texts. |
|
141 |
||
142 |
Variations on this are possible. Rather than writing a single file |
|
143 |
into the repository for each text, we could fold them into a single |
|
144 |
collation or pack file. That would create a smaller number of files |
|
145 |
in the repository, but looking up a single text would require looking |
|
146 |
into their indexes rather than just asking the filesystem. |
|
147 |
||
148 |
Rather than using hashes we can use file-id/rev-id pairs as at |
|
149 |
present, which has several consequences pro and con. |
|
150 |
||
151 |
||
152 |
Commit vs Status |
|
153 |
---------------- |
|
154 |
||
155 |
At first glance, commit simply stores the changes status reports. In fact, |
|
156 |
this isn't technically correct: commit considers some files modified that |
|
157 |
status does not. The notes below were put together by John Arbash Meinel |
|
158 |
and Aaron Bentley in May 2007 to explain the finer details of commit to |
|
159 |
Ian Clatworthy. They are recorded here as they are likely to be useful to |
|
160 |
others new to Bazaar ... |
|
161 |
||
162 |
1) **Unknown files have a different effect.** With --no-strict (the default) |
|
163 |
they have no effect and can be completely ignored. With --strict they |
|
164 |
should cause the commit to abort (so you don't forget to add the two new |
|
165 |
test files that you just created). |
|
166 |
||
167 |
2) **Multiple parents.** 'status' always compares 2 trees, typically the |
|
168 |
last-committed tree and the current working tree. 'commit' will compare |
|
169 |
more trees if there has been a merge. |
|
170 |
||
171 |
a) The "last modified" property for files. |
|
172 |
A file may be marked as changed since the last commit, but that |
|
173 |
change may have come in from the merge, and the change could have |
|
174 |
happened several commits back. There are several edge cases to be |
|
175 |
handled here, like if both branches modified the same file, or if |
|
176 |
just one branch modified it. |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
177 |
|
2511.1.4
by Ian Clatworthy
updated NEWS and added commit performance notes to doc/developers |
178 |
b) The trickier case is when a file appears unmodified since last |
179 |
commit, but it was modified versus one of the merged branches. I |
|
180 |
believe there are a few ways this can happen, like if a merged |
|
181 |
branch changes a file and then reverts it back (you still update |
|
182 |
the 'last modified' field). |
|
183 |
In general, if both sides disagree on the 'last-modified' flag, |
|
184 |
then you need to generate a new entry pointing 'last-modified' at |
|
185 |
this revision (because you are resolving the differences between |
|
186 |
the 2 parents). |
|
187 |
||
188 |
3) **Automatic deletion of 'missing' files.** This is a point that we go |
|
189 |
back and forth on. I think the basic idea is that 'bzr commit' by |
|
190 |
default should abort if it finds a 'missing' file (in case that file was |
|
191 |
renamed rather than deleted), but 'bzr commit --auto' can add unknown |
|
192 |
files and remove missing files automatically. |
|
193 |
||
194 |
4) **sha1 for newly added files.** status doesn't really need this: it should |
|
195 |
only care that the file is not present in base, but is present now. In |
|
196 |
some ways commit doesn't care either, since it needs to read and sha the |
|
197 |
file itself anyway. |
|
198 |
||
199 |
5) **Nested trees.** status doesn't recurse into nested trees, but commit does. |
|
200 |
This is just because not all of the nested-trees work has been merged yet. |
|
201 |
||
202 |
A tree-reference is considered modified if the subtree has been |
|
203 |
committed since the last containing-tree commit. But commit needs to |
|
204 |
recurse into every subtree, to ensure that a commit is done if the |
|
205 |
subtree has changed since its last commit. _iter_changes only reports |
|
206 |
on tree-references that are modified, so it can't be used for doing |
|
207 |
subtree commits. |
|
208 |
||
209 |
||
210 |
Avoiding Work: Smarter Change Detection |
|
211 |
--------------------------------------- |
|
212 |
||
213 |
Commit currently walks through every file building an inventory. Here is |
|
214 |
Aaron's brain dump on a better way ... |
|
215 |
||
216 |
_iter_changes won't tell us about tree references that haven't changed, |
|
217 |
even if those subtrees have changed. (Unless we ask for unchanged |
|
218 |
files, which we don't want to do, of course.) |
|
219 |
||
220 |
There is an iter_references method, but using it looks just as expensive |
|
221 |
as calling kind(). |
|
222 |
||
223 |
I did some work on updating commit to use iter_changes, but found for |
|
224 |
multi-parent trees, I had to fall back to the slow inventory comparison |
|
225 |
approach. |
|
226 |
||
227 |
Really, I think we need a call akin to iter_changes that handles |
|
228 |
multiple parents, and knows to emit entries when InventoryEntry.revision |
|
229 |
is all that's changed. |
|
230 |
||
231 |
||
232 |
Avoiding Work: Better Layering |
|
233 |
------------------------------ |
|
234 |
||
235 |
For each file, commit is currently doing more work than it should. Here is |
|
236 |
John's take on a better way ... |
|
237 |
||
238 |
Note that "_iter_changes" *does* have to touch every path on disk, but |
|
239 |
it just can do it in a more efficient manner. (It doesn't have to create |
|
240 |
an InventoryEntry for all the ones that haven't changed). |
|
241 |
||
242 |
I agree with Aaron that we need something a little different than |
|
243 |
_iter_changes. Both because of handling multiple parents, as well as we |
|
244 |
don't want it to actually read the files if we have a stat-cache miss. |
|
245 |
||
246 |
Specifically, the commit code *has* to read the files because it is |
|
247 |
going to add the text to the repository, and we want it to compute the |
|
248 |
sha1 at *that* time, so we are guaranteed to have the valid sha (rather |
|
249 |
than just whatever the last cached one was). So we want the code to |
|
250 |
return 'None' if it doesn't have an up-to-date sha1, rather than reading |
|
251 |
the file and computing it, just before it returns it to the parent. |
|
252 |
||
253 |
The commit code (0.16) should really be restructured. It's layering is |
|
254 |
pretty wrong. |
|
255 |
||
256 |
Specifically, calling "kind()" requires a stat of the file. But we have |
|
257 |
to do a stat to get the size/whether the record is up-to-date, etc. So |
|
258 |
we really need to have a "create_an_up_to_date_inventory()" function. |
|
259 |
But because we are accessing every object on disk, we want to be working |
|
260 |
in tuples rather than Inventory objects. And because DirState already |
|
261 |
has the parent records next to the current working inventory, it can do |
|
262 |
all the work to do really fast comparison and throw-away of unimportant |
|
263 |
records. |
|
264 |
||
265 |
The way I made "bzr status" fast is by moving the 'ignore this record' |
|
266 |
ability as deep into the stack as I could get. Status has the property |
|
267 |
that you don't care about most of the records, just like commit. So the |
|
268 |
sooner you can stop evaluating the 99% that you don't care about, the |
|
269 |
less work you do. |
|
270 |
||
2513.1.2
by Martin Pool
Remove duplicated commit use case documentation |
271 |
|
2513.1.3
by Martin Pool
More commit specs |
272 |
Avoiding work: avoiding reading parent data |
273 |
------------------------------------------- |
|
274 |
||
275 |
We would like to avoid the work of reading any data about the parent |
|
276 |
revisions. We should at least try to avoid reading anything from the |
|
277 |
repository; we can also consider whether it is possible or useful to hold |
|
278 |
less parent information in the working tree. |
|
279 |
||
280 |
When a commit of selected files is requested, the committed snapshot is a |
|
281 |
composite of some directories from the parent revision and some from the |
|
282 |
working tree. In this case it is logically necessary to have the parent |
|
283 |
inventory information. |
|
284 |
||
285 |
If file last-change information or per-file graph information is stored |
|
286 |
then it must be available from the parent trees. |
|
287 |
||
288 |
If the Branch's storage method does delta compression at commit time it |
|
289 |
may need to retrieve file or inventory texts from the repository. |
|
290 |
||
291 |
It is desirable to avoid roundtrips to the Repository during commit, |
|
292 |
particularly because it may be remote. If the WorkingTree can determine |
|
293 |
by itself that a text was in the parent and therefore should be in the |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
294 |
Repository that avoids one roundtrip per file. |
2513.1.3
by Martin Pool
More commit specs |
295 |
|
296 |
There is a possibility here that the parent revision is not stored, or not |
|
297 |
correctly stored, in the repository the tree is being committed into, and |
|
298 |
so the committed tree would not be reconstructable. We could check that |
|
299 |
the parent revision is present in the inventory and rely on the invariant |
|
300 |
that if a revision is present, everything to reconstruct it will be |
|
301 |
present too. |
|
302 |
||
2513.1.5
by Martin Pool
More analysis of commit |
303 |
|
304 |
Code structure |
|
305 |
-------------- |
|
306 |
||
307 |
Caller starts a commit |
|
308 |
||
309 |
>>> Branch.commit(from_tree, options) |
|
310 |
||
311 |
This creates a CommitBuilder object matched to the Branch, Repository and |
|
312 |
Tree. It can vary depending on model differences or by knowledge of what |
|
313 |
is efficient with the Repository and Tree. Model differences might |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
314 |
include whether no-text-change merges need to be reported, and whether the |
2513.1.5
by Martin Pool
More analysis of commit |
315 |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
316 |
The basic CommitBuilder.commit structure can be |
2513.1.5
by Martin Pool
More analysis of commit |
317 |
|
318 |
1. Ask the branch if it is ready to commit (up to date with master if |
|
319 |
any.) |
|
320 |
||
321 |
2. Ask the tree if it is ready to commit to the branch (up to date with |
|
322 |
branch?), no conflicts, etc |
|
323 |
||
324 |
3. Commit changed files; prototype implementation: |
|
325 |
||
326 |
a. Ask the working tree for all committable files; for each it should |
|
327 |
return the per-file parents, stat information, kind, etc. |
|
328 |
||
329 |
b. Ask the repository to store the new file text; the repository should |
|
330 |
return the stored sha1 and new revision id. |
|
331 |
||
332 |
4. Commit changed inventory |
|
333 |
||
334 |
5. Commit revision object |
|
335 |
||
336 |
||
337 |
||
338 |
||
339 |
||
340 |
||
341 |
||
342 |
||
343 |
||
2513.1.3
by Martin Pool
More commit specs |
344 |
Complications of commit |
345 |
----------------------- |
|
346 |
||
347 |
Bazaar (as of 0.17) does not support selective-file commit of a merge; |
|
348 |
this could be done if we decide how it should be recorded - is this to be |
|
349 |
stored as an overall merge revision; as a preliminary non-merge revisions; |
|
350 |
or will the per-file graph diverge from the revision graph. |
|
351 |
||
352 |
There are several checks that may cause the commit to be refused, which |
|
353 |
may be activated or deactivated by options. |
|
354 |
||
2513.1.4
by Martin Pool
More commit documentation |
355 |
* presence of conflicts in the tree |
356 |
||
357 |
* presence of unknown files |
|
358 |
||
359 |
* the working tree basis is up to date with the branch tip |
|
360 |
||
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
361 |
* the local branch is up to date with the master branch, if there |
2513.1.4
by Martin Pool
More commit documentation |
362 |
is one and --local is not specified |
363 |
||
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
364 |
* an empty commit message is given, |
2513.1.4
by Martin Pool
More commit documentation |
365 |
|
366 |
* a hook flags an error |
|
367 |
||
368 |
* a "pointless" commit, with no inventory changes |
|
2513.1.3
by Martin Pool
More commit specs |
369 |
|
370 |
Most of these require walking the tree and can be easily done while |
|
371 |
recording the tree shape. This does require that it be possible to abort |
|
372 |
the commit after the tree changes have been recorded. It could be ok to |
|
373 |
either leave the unreachable partly-committed records in the repository, |
|
374 |
or to roll back. |
|
375 |
||
376 |
Other complications: |
|
377 |
||
2513.1.4
by Martin Pool
More commit documentation |
378 |
* when automatically adding new files or deleting missing files during |
379 |
commit, they must be noted during commit and written into the working |
|
380 |
tree at some point |
|
381 |
||
382 |
* refuse "pointless" commits with no file changes - should be easy by |
|
383 |
just refusing to do the final step of storing a new overall inventory |
|
384 |
and revision object |
|
385 |
||
386 |
* heuristic detection of renames between add and delete (out of scope for |
|
387 |
this change) |
|
388 |
||
389 |
* pushing changes to a master branch if any |
|
390 |
||
391 |
* running hooks, pre and post commit |
|
392 |
||
393 |
* prompting for a commit message if necessary, including a list of the |
|
394 |
changes that have already been observed |
|
395 |
||
396 |
* if there are tree references and recursing into them is enabled, then |
|
397 |
do so |
|
2513.1.3
by Martin Pool
More commit specs |
398 |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
399 |
Commit needs to protect against duplicated file ids |
2513.1.5
by Martin Pool
More analysis of commit |
400 |
|
401 |
||
2513.1.3
by Martin Pool
More commit specs |
402 |
Updates that need to be made in the working tree, either on conclusion |
403 |
of commit or during the scan, include |
|
404 |
||
2513.1.4
by Martin Pool
More commit documentation |
405 |
* Changes made to the tree shape, including automatic adds, renames or |
406 |
deletes |
|
407 |
||
408 |
* For trees (eg dirstate) that cache parent inventories, the old parent |
|
409 |
information must be removed and the new one inserted |
|
410 |
||
411 |
* The tree hashcache information should be updated to reflect the stat |
|
412 |
value at which the file was the same as the committed version, and the |
|
413 |
content hash it was observed to have. This needs to be done carefully to |
|
414 |
prevent inconsistencies if the file is modified during or shortly after |
|
415 |
the commit. Perhaps it would work to read the mtime of the file before we |
|
416 |
read its text to commit. |
|
2513.1.3
by Martin Pool
More commit specs |
417 |
|
418 |
||
2513.1.2
by Martin Pool
Remove duplicated commit use case documentation |
419 |
Interface stack |
420 |
--------------- |
|
421 |
||
422 |
The commit api is invoked by the command interface, and copies information |
|
423 |
from the tree into the branch and its repository, possibly updating the |
|
424 |
WorkingTree afterwards. |
|
425 |
||
426 |
The command interface passes: |
|
2513.1.4
by Martin Pool
More commit documentation |
427 |
|
428 |
* a commit message (from an option, if any), |
|
429 |
* or an indication that it should be read interactively from the ui object; |
|
430 |
* a list of files to commit |
|
431 |
* an option for a dry-run commit |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
432 |
* verbose option, or callback to indicate |
2513.1.4
by Martin Pool
More commit documentation |
433 |
* timestamp, timezone, committer, chosen revision id |
434 |
* config (for what?) |
|
435 |
* option for local-only commit on a bound branch |
|
436 |
* option for strict commits (fail if there are unknown or missing files) |
|
437 |
* option to allow "pointless" commits (with no tree changes) |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
438 |
|
2513.1.3
by Martin Pool
More commit specs |
439 |
(This is rather a lot of options to pass individually and just for code tidyness maybe some of them should be combine into objects.) |
2513.1.2
by Martin Pool
Remove duplicated commit use case documentation |
440 |
|
2513.1.3
by Martin Pool
More commit specs |
441 |
>>> Branch.commit(from_tree, message, files_to_commit, ...) |
2513.1.2
by Martin Pool
Remove duplicated commit use case documentation |
442 |
|
443 |
There will be different implementations of this for different Branch |
|
444 |
classes, whether for foreign branches or Bazaar repositories using |
|
445 |
different storage methods. |
|
446 |
||
447 |
Most of the commit should occur during a single lockstep iteration across |
|
448 |
the workingtree and parent trees. The WorkingTree interface needs to |
|
449 |
provide methods that give commit all it needs. Some of these methods |
|
450 |
(such as answering the file's last change revision) may be deprecated in |
|
451 |
newer working trees and there we have a choice of either calculating the |
|
452 |
value from the data that is present, or refusing to support commit to |
|
453 |
newer repositories. |
|
454 |
||
455 |
For a dirstate tree the iteration of changes from the parent can easily be |
|
456 |
done within its own iter_changes. |
|
457 |
||
458 |
Dirstate inventories may be most easily updated in a single operation at |
|
459 |
the end; however it may be best to accumulate data as we proceed through |
|
460 |
the tree rather than revisiting it at the end. |
|
461 |
||
462 |
Showing a progress bar for commit may not be necessary if we report files |
|
463 |
as they are committed. Alternatively we could transiently show a progress |
|
464 |
bar for each directory that's scanned, even if no changes are observed. |
|
465 |
||
466 |
This needs to collect a list of added/changed/removed files, each of which |
|
467 |
must have its text stored (if any) and containing directory updated. This |
|
468 |
can be done by calling Tree._iter_changes on the source tree, asking for |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
469 |
changes |
2513.1.2
by Martin Pool
Remove duplicated commit use case documentation |
470 |
|
471 |
In the 0.17 model the commit operation needs to know the per-file parents |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
472 |
and per-file last-changed revision. |
2513.1.2
by Martin Pool
Remove duplicated commit use case documentation |
473 |
|
474 |
(In this and other operations we must avoid having multiple layers walk |
|
475 |
over the tree separately. For example, it is no good to have the Command |
|
476 |
layer walk the tree to generate a list of all file ids to commit, because |
|
477 |
the tree will also be walked later. The layers that do need to operate |
|
478 |
per-file should probably be bound together in a per-dirblock iterator, |
|
479 |
rather than each iterating independently.) |
|
2513.1.3
by Martin Pool
More commit specs |
480 |
|
481 |
Branch->Tree interface |
|
482 |
---------------------- |
|
483 |
||
484 |
The Branch commit code needs to ask the Tree what should be committed, in |
|
485 |
terms of changes from the parent revisions. If the Tree holds all the |
|
486 |
necessary parent tree information itself it can do it single handed; |
|
487 |
otherwise it may need to ask the Repository for parent information. |
|
488 |
||
489 |
This should be a streaming interface, probably like iter_changes returning |
|
490 |
information per directory block. |
|
491 |
||
492 |
The interface should not return a block for directories that are |
|
493 |
recursively unchanged. |
|
494 |
||
2513.1.4
by Martin Pool
More commit documentation |
495 |
The tree's idea of what is possibly changed may be more conservative than |
496 |
that of the branch. For example the tree may report on merges of files |
|
497 |
where the text is identical to the parents: this must be recorded for |
|
498 |
Bazaar branches that record per-file ancestry but is not necessary for all |
|
499 |
branches. If the tree is responsible for determining when directories |
|
500 |
have been recursively modified then it will report on all the parents of |
|
501 |
such files. There are several implementation options: |
|
502 |
||
503 |
1. Return all files and directories the branch might want to commit, even |
|
504 |
if the branch ends up taking no action on them. |
|
505 |
||
506 |
2. When starting the iteration, the branch can specify what type of change |
|
507 |
is considered interesting. |
|
508 |
||
509 |
Since these types of changes are probably (??) rare compared to files that |
|
510 |
are either completely unmodified or substantially modified, the first may |
|
511 |
be the best and simplest option. |
|
512 |
||
513 |
The branch needs to build an inventory to commit, which must include |
|
514 |
unchanged files within changed directories. This should be returned from |
|
515 |
the working tree too. Repositories that store per-directory inventories |
|
516 |
will want to build and store these from the lowest directories up. |
|
517 |
For 0.17 format repositories with an all-in-one inventory it may be |
|
518 |
easiest to accumulate inventory entries in arbitrary order into an |
|
519 |
in-memory Inventory and then serialize it. |
|
520 |
||
521 |
It ought to be possible to commit any Tree into a Branch, without |
|
522 |
requiring a WorkingTree; the commit code should cope if the tree is not |
|
523 |
interested in updating hashcache information or does not have a |
|
524 |
``last_revision``. |
|
525 |
||
526 |
||
527 |
Information from the tree to repository |
|
528 |
--------------------------------------- |
|
529 |
||
530 |
The main things the tree needs to tell the Branch about are: |
|
531 |
||
532 |
* A file is modified from its parent revision (in text, permissions, |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
533 |
other), and so its text may need to be stored. |
534 |
||
2513.1.4
by Martin Pool
More commit documentation |
535 |
Files should also be reported if they have more than one unique parent |
536 |
revision, for repositories that store per-file graphs or last-change |
|
537 |
revisions. Perhaps this behaviour should be optional. |
|
538 |
||
2513.1.5
by Martin Pool
More analysis of commit |
539 |
**XXX:** are renames/deletions reported here too? |
540 |
||
2513.1.4
by Martin Pool
More commit documentation |
541 |
* The complete contents of a modified directory, so that its inventory |
542 |
text may be stored. This should be done after all the contained files |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
543 |
and directories have been reported. If there are unmodified files, |
544 |
or unselected files carried through from |
|
2513.1.4
by Martin Pool
More commit documentation |
545 |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
546 |
XXX: Actually perhaps not grouped by directory, but rather grouped |
547 |
appropriately for the shape of inventory storage in the repository. |
|
2513.1.5
by Martin Pool
More analysis of commit |
548 |
|
549 |
In a zoomed-in checkout the workingtree may not have all the shape data |
|
550 |
for the entire tree. |
|
551 |
||
2513.1.4
by Martin Pool
More commit documentation |
552 |
* A file is missing -- could cause either automatic removal or an aborted |
553 |
commit. |
|
554 |
||
555 |
* Any unknown files -- can cause automatic addition, abortion of a strict |
|
556 |
commit, or just reporting. |
|
557 |
||
558 |
||
559 |
Information from the repository to the tree |
|
560 |
------------------------------------------- |
|
561 |
||
562 |
After the commit the tree needs to be updated to the new revision. Some |
|
563 |
information which was accumulated during the commit must be made available |
|
564 |
to the workingtree. It's probably reasonable to hold it all in memory and |
|
565 |
allow the workingtree to get it in whatever order it wants. |
|
566 |
||
567 |
* A list of modified entries, and for each one: |
|
568 |
||
569 |
* The stat values observed when the file was first read. |
|
570 |
||
571 |
* The hash of the committed file text. |
|
572 |
||
573 |
* The file's last-change revision, if appropriate. |
|
574 |
||
575 |
This should include any entries automatically added or removed. |
|
576 |
||
577 |
This might be construed as an enhanced version of ``set_parent_trees``. |
|
578 |
We can avoid a stat on each file by using the value that was observed when |
|
579 |
it was first read. |
|
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
580 |
|
2513.1.4
by Martin Pool
More commit documentation |
581 |
|
582 |
||
583 |
Selective commit |
|
584 |
---------------- |
|
585 |
||
586 |
For a partial commit the directory contents may need to contain a mix of |
|
587 |
entries from the working tree and parent trees. This code probably |
|
588 |
shouldn't live in a specific tree implementation; maybe there should be a |
|
589 |
general filter that selects paths from one tree into another? |
|
590 |
||
591 |
However, the tree walking code does probably need to know about selected |
|
592 |
paths to avoid examining unselected files or directories. |
|
593 |
||
4853.1.1
by Patrick Regan
Removed trailing whitespace from files in doc directory |
594 |
We never refuse selective file commits (except of merges). |
2513.1.5
by Martin Pool
More analysis of commit |
595 |
|
2513.1.4
by Martin Pool
More commit documentation |
596 |
|
597 |
||
598 |
Common commit code |
|
599 |
------------------ |
|
600 |
||
601 |
What is common to all commit implementations, regardless of workingtree or |
|
602 |
repository format? |
|
603 |
||
604 |
* Prompting for a commit message? |
|
605 |
* Strictness/conflict checks? |
|
606 |
* Auto add/remove? |
|
607 |
||
608 |
How should this be separated? |
|
609 |
||
610 |
||
611 |
||
612 |
Order of traversal |
|
613 |
------------------ |
|
614 |
||
615 |
For current and contemplated Bazaar storage formats, we can only finally |
|
616 |
commit a directory after its contained files and directories have been |
|
617 |
committed. |
|
618 |
||
619 |
The dirstate workingtree format naturally iterates by directory in order |
|
620 |
by path, yielding directories before their contents. This may also be the |
|
621 |
most efficient order in which to stat and read the files. |
|
622 |
||
623 |
One option would be to construe the interface as a visitor which reports |
|
624 |
when files are detected to be changed, and also when directories are |
|
625 |
finished. |
|
2513.1.3
by Martin Pool
More commit specs |
626 |
|
627 |
||
628 |
Open question: per-file graphs |
|
629 |
------------------------------ |
|
630 |
||
631 |
**XXX:** If we want to retain explicitly stored per-file graphs, it would |
|
632 |
seem that we do need to record per-file parents. We have not yet finally |
|
633 |
settled that we do want to remove them or treat them as a cache. This api |
|
634 |
stack is still ok whether we do or not, but the internals of it may |
|
635 |
change. |