6
by mbp at sourcefrog
import all docs from arch |
1 |
Interrupted operations |
2 |
********************** |
|
3 |
||
4 |
Problem: interrupted operations |
|
5 |
=============================== |
|
6 |
||
7 |
Many version control systems tend to have trouble when operations are |
|
8 |
interrupted. This can happen in various ways: |
|
9 |
||
10 |
* user hits Ctrl-C |
|
11 |
||
12 |
* program hits a bug and aborts |
|
13 |
||
14 |
* machine crashes |
|
15 |
||
16 |
* network goes down |
|
17 |
||
18 |
* tree is naively copied (e.g. by cp/tar) while an operation is in |
|
19 |
progress |
|
20 |
||
21 |
We can reduce the window during which operations can be interrupted: |
|
22 |
most importantly, by receiving everything off the network into a |
|
23 |
staging area, so that network interruptions won't leave a job half |
|
24 |
complete. But it is not possible to totally avoid this, because the |
|
25 |
power can always fail. |
|
26 |
||
27 |
I think we can reasonably rely on flushing to stable storage at |
|
28 |
various points, and trust that such files will be accessible when we |
|
29 |
come back up. |
|
30 |
||
31 |
I think by using this and building from the bottom up there are never |
|
32 |
any broken pointers in the branch metadata: first we add the file |
|
33 |
versions, then the inventory, then the revision and signature, then |
|
34 |
link them into the revision history. The worst that can happen is |
|
35 |
that there will be some orphaned files if this is interrupted at any |
|
36 |
point. |
|
37 |
||
38 |
rsync is just impossible in the general case: it reads the files in a |
|
39 |
fairly unpredictable order, so what it copies may not be a tree that |
|
40 |
existed at any particular point in time. If people want to make |
|
41 |
backups or replicate using rsync they need to treat it like any other |
|
42 |
database and either |
|
43 |
||
44 |
* make a copy which will not be updated, and rsync from that |
|
45 |
||
46 |
* lock the database while rsyncing |
|
47 |
||
48 |
The operating system facilities are not sufficient to protect against |
|
49 |
all of these. We cannot satisfactorily commit a whole atomic |
|
50 |
transaction in one step. |
|
51 |
||
52 |
Operations might be updating either the metadata or the working copy. |
|
53 |
||
54 |
The working copy is in some ways more difficult: |
|
55 |
||
56 |
* Other processes are allowed to modify it from time to time in |
|
57 |
arbitrary ways. |
|
58 |
||
59 |
If they modify it while bazaar is working then they will lose, but |
|
60 |
we should at least try to make sure there is no corruption. |
|
61 |
||
62 |
* We can't atomically replace the whole working copy. We can |
|
63 |
(semi) atomically updated particular files. |
|
64 |
||
65 |
* If the working copy files are in a wierd state it is hard to know |
|
66 |
whether that occurred because bzr's work was interrupted or because |
|
67 |
the user changed them. |
|
68 |
||
69 |
(A reasonable user might run ``bzr revert`` if they notice |
|
70 |
something like this has happened, but it would be nice to avoid |
|
71 |
it.) |
|
72 |
||
73 |
We don't want to leave things in a broken state. |
|
74 |
||
75 |
||
76 |
Solution: write-ahead journaling? |
|
77 |
================================= |
|
78 |
||
79 |
One possibly solution might be write-ahead journaling: |
|
80 |
||
81 |
Before beginning a change, write and flush to disk a description of |
|
82 |
what change will be made. |
|
83 |
||
84 |
Every bzr operation checks this journal; if there are any pending |
|
85 |
operations waiting then they are completed first, before proceeding |
|
86 |
with whatever the user wanted. (Perhaps this should be in a |
|
87 |
separate ``bzr recover``, but I think it's better to just do it, |
|
88 |
perhaps with a warning.) |
|
89 |
||
90 |
The descriptions written into the journal need to be simple enough |
|
91 |
that they can safely be re-run in a totally different context. They |
|
92 |
must not depend on any external resources which might have gone |
|
93 |
away. |
|
94 |
||
95 |
If we can do anything without depending on journalling we should. |
|
96 |
||
97 |
It may be that the only case where we cannot get by with just |
|
98 |
ordering is in updating the working copy; the user might get into a |
|
99 |
difficult situation where they have pulled in a change and only half |
|
100 |
the working copy has been updated. One solution would be to remove |
|
101 |
the working copy files, or mark them readonly, while this is in |
|
102 |
progress. We don't want people accidentally writing to a file that |
|
103 |
needs to be overwritten. |
|
104 |
||
105 |
Or perhaps, in this particular case, it is OK to leave them in |
|
106 |
pointing to an old state, and let people revert if they're sure they |
|
107 |
want the new one? Sounds dangerous. |
|
108 |
||
109 |
Aaron points out that this basically sounds like changesets. So |
|
110 |
before updating the history, we first calculate the changeset and |
|
111 |
write it out to stable storage as a single file. We then apply the |
|
112 |
changeset, possibly updating several files. Each command should check |
|
113 |
whether such an application was in progress. |