6
by mbp at sourcefrog
import all docs from arch |
1 |
Detecting unchanged files |
2 |
************************* |
|
3 |
||
4 |
Many operations need to know which working files have changed compared |
|
5 |
to the basis revision. (We also sometimes want to know which files |
|
6 |
have changed between two revisions, but since we know the text-ids |
|
7 |
and hashes that is much easier.) |
|
8 |
||
9 |
The simplest way is to just directly compare the files. This is |
|
10 |
simple and reliable, but has the disadvantage that we need to read in |
|
11 |
both files. For a large tree like the kernel or even samba, this can |
|
12 |
use a lot of cache memory and/or be slow. Some people have machines |
|
13 |
that do not have enough memory to hold even one copy of the tree at a |
|
14 |
time, and this would use two copies. |
|
15 |
||
16 |
So it is nice to know which files have not changed without actually |
|
17 |
reading them. Possibilities: |
|
18 |
||
19 |
* Make the working files be hardlinks to the file store. Easy to see |
|
20 |
if they are still the same file or not by simply stat'ing them. For |
|
21 |
extra protection, make the stored files readonly. Has the |
|
22 |
additional advantage of reducing the disk usage of the working copy. |
|
23 |
||
24 |
Disadvantage is that some people have editors that do not handle |
|
25 |
this safely. In that case the changes will go undetected, and they |
|
26 |
could corrupt history. Pretty bad. |
|
27 |
||
28 |
We can provide a ``bzr edit`` command that breaks the link and makes |
|
29 |
the working copy writable. |
|
30 |
||
31 |
* As above, but link to a temporary pristine directory, not to the |
|
32 |
real store. They can get a wrong answer, but at least cannot |
|
33 |
corrupt the store. |
|
34 |
||
35 |
* Check the mtime and size of the file; compare them to those of the |
|
36 |
previous stored version. |
|
37 |
||
38 |
The mtime doesn't need to be the time the previous revision was |
|
39 |
committed. |
|
40 |
||
41 |
There is a possibility of a race here where the file is modified but |
|
42 |
does not change size, all in the second after the checkout. Many |
|
43 |
filesystems don't report sub-second modification times, but Linux |
|
44 |
now allows for it and it may be supported in future. |
|
45 |
||
46 |
* Read in all the working files, but first compare them to the |
|
47 |
size and text-hash of the previous file revision; only do the diff |
|
48 |
if they have actually changed. Means only reading one tree, not |
|
49 |
two, but we still have to scan all the source. |
|
50 |
||
51 |
* Copy the file, but use an explicit edit command to remember which |
|
52 |
ones have been changed. Uneditable files should be readonly to |
|
53 |
prevent inadvertent changes. |
|
54 |
||
55 |
The problem with almost all of these models is that it is possible for |
|
56 |
people to change a file without tripping them. The only way to make |
|
57 |
this perfectly safe is to actually compare. So perhaps there should |
|
58 |
be a paranoia option. |
|
59 |
||
60 |
It is crucial that no failure can lose history. Does that mean |
|
61 |
hardlinks directly into the file store are just too risky? |
|
62 |
||
63 |
It is most important that ``diff``, ``status`` and similar things be |
|
64 |
fast, because they are invoked often. It may be that the ``commit`` |
|
65 |
command can tolerate being somewhat slower -- but then it would be |
|
66 |
confusing if ``commit`` saw something different to what ``diff`` does, |
|
67 |
so they should be the same. |
|
68 |
||
69 |
For the mooted `edit command`__, we will know whether a file is |
|
70 |
checked out as r/o or r/w; if a file is known to be read-only it can |
|
71 |
be assumed to be unmodified. |
|
72 |
||
73 |
__ optional-edit.html |
|
74 |
||
75 |
The ``check`` command can make sure that any files that would be |
|
76 |
assumed to be unmodified are actually unmodified. |
|
77 |
||
78 |
File times can skew, particularly on network filesystems. We should |
|
79 |
not make any assumptions that mtimes are the same as the system clock |
|
80 |
at the time of creation (and that would probably be racy anyhow). |
|
81 |
||
82 |
Proposal |
|
83 |
-------- |
|
84 |
||
85 |
:: |
|
86 |
||
87 |
if file is not editable: |
|
88 |
unmodified |
|
89 |
||
90 |
if not paranoid: |
|
91 |
if same inum and device as basis: |
|
92 |
unmodified |
|
93 |
elif present as read-only: |
|
94 |
unmodified |
|
95 |
elif same mtime and size: |
|
96 |
unmodified |
|
97 |
||
98 |
read working file, calculate sha1 |
|
99 |
if same size and sha-1 as previous inventory-entry: |
|
100 |
unmodified |
|
101 |
||
102 |
possibly-modified |