~bzr-pqm/bzr/bzr.dev

4763.2.4 by John Arbash Meinel
merge bzr.2.1 in preparation for NEWS entry.
1
# Copyright (C) 2006-2010 Canonical Ltd
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
4183.7.1 by Sabin Iacob
update FSF mailing address
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
16
1570.1.7 by Robert Collins
Replace the slow topo_sort routine with a much faster one for non trivial datasets.
17
"""Reconcilers are able to fix some potential data errors in a branch."""
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
18
19
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
20
__all__ = [
21
    'KnitReconciler',
22
    'PackReconciler',
23
    'reconcile',
24
    'Reconciler',
25
    'RepoReconciler',
26
    ]
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
27
28
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
29
from bzrlib import (
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
30
    cleanup,
2745.6.16 by Aaron Bentley
Update from review
31
    errors,
5972.2.2 by Jelmer Vernooij
Fix import
32
    revision as _mod_revision,
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
33
    ui,
34
    )
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
35
from bzrlib.trace import mutter
4577.2.4 by Maarten Bosmans
Make shure the faster topo_sort function is used where appropriate
36
from bzrlib.tsort import topo_sort
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
37
from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
38
from bzrlib.i18n import gettext
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
39
40
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
41
def reconcile(dir, canonicalize_chks=False):
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
42
    """Reconcile the data in dir.
43
44
    Currently this is limited to a inventory 'reweave'.
45
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
46
    This is a convenience method, for using a Reconciler object.
47
48
    Directly using Reconciler is recommended for library users that
49
    desire fine grained control or analysis of the found issues.
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
50
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
51
    :param canonicalize_chks: Make sure CHKs are in canonical form.
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
52
    """
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
53
    reconciler = Reconciler(dir, canonicalize_chks=canonicalize_chks)
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
54
    reconciler.reconcile()
55
56
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
57
class Reconciler(object):
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
58
    """Reconcilers are used to reconcile existing data."""
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
59
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
60
    def __init__(self, dir, other=None, canonicalize_chks=False):
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
61
        """Create a Reconciler."""
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
62
        self.bzrdir = dir
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
63
        self.canonicalize_chks = canonicalize_chks
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
64
65
    def reconcile(self):
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
66
        """Perform reconciliation.
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
67
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
68
        After reconciliation the following attributes document found issues:
5891.1.2 by Andrew Bennetts
Fix a bunch of docstring formatting nits, making pydoctor a bit happier.
69
70
        * `inconsistent_parents`: The number of revisions in the repository
71
          whose ancestry was being reported incorrectly.
72
        * `garbage_inventories`: The number of inventory objects without
73
          revisions that were garbage collected.
74
        * `fixed_branch_history`: None if there was no branch, False if the
75
          branch history was correct, True if the branch history needed to be
76
          re-normalized.
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
77
        """
1594.1.3 by Robert Collins
Fixup pb usage to use nested_progress_bar.
78
        self.pb = ui.ui_factory.nested_progress_bar()
79
        try:
80
            self._reconcile()
81
        finally:
82
            self.pb.finished()
83
84
    def _reconcile(self):
85
        """Helper function for performing reconciliation."""
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
86
        self._reconcile_branch()
87
        self._reconcile_repository()
88
89
    def _reconcile_branch(self):
90
        try:
91
            self.branch = self.bzrdir.open_branch()
92
        except errors.NotBranchError:
93
            # Nothing to check here
3389.2.7 by John Arbash Meinel
Review comments from Ian
94
            self.fixed_branch_history = None
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
95
            return
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
96
        ui.ui_factory.note(gettext('Reconciling branch %s') % self.branch.base)
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
97
        branch_reconciler = self.branch.reconcile(thorough=True)
98
        self.fixed_branch_history = branch_reconciler.fixed_history
99
100
    def _reconcile_repository(self):
1570.1.11 by Robert Collins
Make reconcile work with shared repositories.
101
        self.repo = self.bzrdir.find_repository()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
102
        ui.ui_factory.note(gettext('Reconciling repository %s') %
5158.6.10 by Martin Pool
Update more code to use user_transport when it should
103
            self.repo.user_url)
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
104
        self.pb.update(gettext("Reconciling repository"), 0, 1)
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
105
        if self.canonicalize_chks:
5375.1.6 by Andrew Bennetts
Don't traceback if a repository doesn't support reconcile_canonicalize_chks.
106
            try:
107
                self.repo.reconcile_canonicalize_chks
108
            except AttributeError:
109
                raise errors.BzrError(
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
110
                    gettext("%s cannot canonicalize CHKs.") % (self.repo,))
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
111
            repo_reconciler = self.repo.reconcile_canonicalize_chks()
112
        else:
113
            repo_reconciler = self.repo.reconcile(thorough=True)
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
114
        self.inconsistent_parents = repo_reconciler.inconsistent_parents
115
        self.garbage_inventories = repo_reconciler.garbage_inventories
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
116
        if repo_reconciler.aborted:
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
117
            ui.ui_factory.note(gettext(
118
                'Reconcile aborted: revision index has inconsistent parents.'))
119
            ui.ui_factory.note(gettext(
120
                'Run "bzr check" for more details.'))
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
121
        else:
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
122
            ui.ui_factory.note(gettext('Reconciliation complete.'))
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
123
124
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
125
class BranchReconciler(object):
126
    """Reconciler that works on a branch."""
127
128
    def __init__(self, a_branch, thorough=False):
129
        self.fixed_history = None
130
        self.thorough = thorough
131
        self.branch = a_branch
132
133
    def reconcile(self):
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
134
        operation = cleanup.OperationWithCleanups(self._reconcile)
135
        self.add_cleanup = operation.add_cleanup
136
        operation.run_simple()
137
138
    def _reconcile(self):
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
139
        self.branch.lock_write()
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
140
        self.add_cleanup(self.branch.unlock)
141
        self.pb = ui.ui_factory.nested_progress_bar()
142
        self.add_cleanup(self.pb.finished)
143
        self._reconcile_steps()
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
144
145
    def _reconcile_steps(self):
146
        self._reconcile_revision_history()
147
148
    def _reconcile_revision_history(self):
149
        last_revno, last_revision_id = self.branch.last_revision_info()
4266.3.11 by Jelmer Vernooij
Support reconcile on branches with ghosts in their mainline.
150
        real_history = []
5972.2.1 by Jelmer Vernooij
Deprecate Repository.iter_reverse_revision_history.
151
        graph = self.branch.repository.get_graph()
4266.3.11 by Jelmer Vernooij
Support reconcile on branches with ghosts in their mainline.
152
        try:
5972.2.1 by Jelmer Vernooij
Deprecate Repository.iter_reverse_revision_history.
153
            for revid in graph.iter_lefthand_ancestry(
154
                    last_revision_id, (_mod_revision.NULL_REVISION,)):
4266.3.11 by Jelmer Vernooij
Support reconcile on branches with ghosts in their mainline.
155
                real_history.append(revid)
156
        except errors.RevisionNotPresent:
157
            pass # Hit a ghost left hand parent
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
158
        real_history.reverse()
159
        if last_revno != len(real_history):
160
            self.fixed_history = True
161
            # Technically for Branch5 formats, it is more efficient to use
162
            # set_revision_history, as this will regenerate it again.
163
            # Not really worth a whole BranchReconciler class just for this,
164
            # though.
6147.1.1 by Jonathan Riddell
use .format() instead of % for string formatting where there are multiple formats in one string to allow for translations
165
            ui.ui_factory.note(gettext('Fixing last revision info {0} '\
166
                                       ' => {1}').format(
167
                                       last_revno, len(real_history)))
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
168
            self.branch.set_last_revision_info(len(real_history),
169
                                               last_revision_id)
170
        else:
171
            self.fixed_history = False
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
172
            ui.ui_factory.note(gettext('revision_history ok.'))
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
173
174
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
175
class RepoReconciler(object):
176
    """Reconciler that reconciles a repository.
177
2857.1.2 by Robert Collins
Review feedback.
178
    The goal of repository reconciliation is to make any derived data
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
179
    consistent with the core data committed by a user. This can involve
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
180
    reindexing, or removing unreferenced data if that can interfere with
181
    queries in a given repository.
182
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
183
    Currently this consists of an inventory reweave with revision cross-checks.
184
    """
185
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
186
    def __init__(self, repo, other=None, thorough=False):
187
        """Construct a RepoReconciler.
188
189
        :param thorough: perform a thorough check which may take longer but
190
                         will correct non-data loss issues such as incorrect
191
                         cached data.
192
        """
193
        self.garbage_inventories = 0
194
        self.inconsistent_parents = 0
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
195
        self.aborted = False
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
196
        self.repo = repo
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
197
        self.thorough = thorough
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
198
199
    def reconcile(self):
200
        """Perform reconciliation.
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
201
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
202
        After reconciliation the following attributes document found issues:
5891.1.2 by Andrew Bennetts
Fix a bunch of docstring formatting nits, making pydoctor a bit happier.
203
204
        * `inconsistent_parents`: The number of revisions in the repository
205
          whose ancestry was being reported incorrectly.
206
        * `garbage_inventories`: The number of inventory objects without
207
          revisions that were garbage collected.
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
208
        """
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
209
        operation = cleanup.OperationWithCleanups(self._reconcile)
210
        self.add_cleanup = operation.add_cleanup
211
        operation.run_simple()
212
213
    def _reconcile(self):
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
214
        self.repo.lock_write()
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
215
        self.add_cleanup(self.repo.unlock)
216
        self.pb = ui.ui_factory.nested_progress_bar()
217
        self.add_cleanup(self.pb.finished)
218
        self._reconcile_steps()
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
219
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
220
    def _reconcile_steps(self):
221
        """Perform the steps to reconcile this repository."""
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
222
        self._reweave_inventory()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
223
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
224
    def _reweave_inventory(self):
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
225
        """Regenerate the inventory weave for the repository from scratch.
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
226
227
        This is a smart function: it will only do the reweave if doing it
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
228
        will correct data issues. The self.thorough flag controls whether
229
        only data-loss causing issues (!self.thorough) or all issues
230
        (self.thorough) are treated as requiring the reweave.
231
        """
1563.2.29 by Robert Collins
Remove all but fetch references to repository.revision_store.
232
        transaction = self.repo.get_transaction()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
233
        self.pb.update(gettext('Reading inventory data'))
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
234
        self.inventory = self.repo.inventories
235
        self.revisions = self.repo.revisions
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
236
        # the total set of revisions to process
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
237
        self.pending = set([key[-1] for key in self.revisions.keys()])
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
238
239
        # mapping from revision_id to parents
240
        self._rev_graph = {}
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
241
        # errors that we detect
242
        self.inconsistent_parents = 0
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
243
        # we need the revision id of each revision and its available parents list
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
244
        self._setup_steps(len(self.pending))
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
245
        for rev_id in self.pending:
246
            # put a revision into the graph.
247
            self._graph_revision(rev_id)
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
248
        self._check_garbage_inventories()
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
249
        # if there are no inconsistent_parents and
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
250
        # (no garbage inventories or we are not doing a thorough check)
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
251
        if (not self.inconsistent_parents and
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
252
            (not self.garbage_inventories or not self.thorough)):
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
253
            ui.ui_factory.note(gettext('Inventory ok.'))
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
254
            return
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
255
        self.pb.update(gettext('Backing up inventory'), 0, 0)
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
256
        self.repo._backup_inventory()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
257
        ui.ui_factory.note(gettext('Backup inventory created.'))
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
258
        new_inventories = self.repo._temp_inventories()
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
259
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
260
        # we have topological order of revisions and non ghost parents ready.
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
261
        self._setup_steps(len(self._rev_graph))
4577.2.4 by Maarten Bosmans
Make shure the faster topo_sort function is used where appropriate
262
        revision_keys = [(rev_id,) for rev_id in topo_sort(self._rev_graph)]
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
263
        stream = self._change_inv_parents(
3606.7.7 by John Arbash Meinel
Add tests for the fetching behavior.
264
            self.inventory.get_record_stream(revision_keys, 'unordered', True),
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
265
            self._new_inv_parents,
266
            set(revision_keys))
267
        new_inventories.insert_record_stream(stream)
268
        # if this worked, the set of new_inventories.keys should equal
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
269
        # self.pending
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
270
        if not (set(new_inventories.keys()) ==
271
            set([(revid,) for revid in self.pending])):
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
272
            raise AssertionError()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
273
        self.pb.update(gettext('Writing weave'))
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
274
        self.repo._activate_new_inventory()
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
275
        self.inventory = None
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
276
        ui.ui_factory.note(gettext('Inventory regenerated.'))
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
277
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
278
    def _new_inv_parents(self, revision_key):
279
        """Lookup ghost-filtered parents for revision_key."""
280
        # Use the filtered ghostless parents list:
281
        return tuple([(revid,) for revid in self._rev_graph[revision_key[-1]]])
282
283
    def _change_inv_parents(self, stream, get_parents, all_revision_keys):
284
        """Adapt a record stream to reconcile the parents."""
285
        for record in stream:
286
            wanted_parents = get_parents(record.key)
287
            if wanted_parents and wanted_parents[0] not in all_revision_keys:
288
                # The check for the left most parent only handles knit
289
                # compressors, but this code only applies to knit and weave
290
                # repositories anyway.
291
                bytes = record.get_bytes_as('fulltext')
292
                yield FulltextContentFactory(record.key, wanted_parents, record.sha1, bytes)
293
            else:
294
                adapted_record = AdapterFactory(record.key, wanted_parents, record)
295
                yield adapted_record
296
            self._reweave_step('adding inventories')
297
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
298
    def _setup_steps(self, new_total):
299
        """Setup the markers we need to control the progress bar."""
300
        self.total = new_total
301
        self.count = 0
302
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
303
    def _graph_revision(self, rev_id):
304
        """Load a revision into the revision graph."""
305
        # pick a random revision
306
        # analyse revision id rev_id and put it in the stack.
307
        self._reweave_step('loading revisions')
1570.1.13 by Robert Collins
Check for incorrect revision parentage in the weave during revision access.
308
        rev = self.repo.get_revision_reconcile(rev_id)
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
309
        parents = []
310
        for parent in rev.parent_ids:
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
311
            if self._parent_is_available(parent):
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
312
                parents.append(parent)
313
            else:
314
                mutter('found ghost %s', parent)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
315
        self._rev_graph[rev_id] = parents
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
316
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
317
    def _check_garbage_inventories(self):
318
        """Check for garbage inventories which we cannot trust
319
320
        We cant trust them because their pre-requisite file data may not
321
        be present - all we know is that their revision was not installed.
322
        """
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
323
        if not self.thorough:
324
            return
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
325
        inventories = set(self.inventory.keys())
326
        revisions = set(self.revisions.keys())
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
327
        garbage = inventories.difference(revisions)
328
        self.garbage_inventories = len(garbage)
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
329
        for revision_key in garbage:
330
            mutter('Garbage inventory {%s} found.', revision_key[-1])
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
331
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
332
    def _parent_is_available(self, parent):
333
        """True if parent is a fully available revision
334
335
        A fully available revision has a inventory and a revision object in the
336
        repository.
337
        """
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
338
        if parent in self._rev_graph:
339
            return True
340
        inv_present = (1 == len(self.inventory.get_parent_map([(parent,)])))
341
        return (inv_present and self.repo.has_revision(parent))
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
342
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
343
    def _reweave_step(self, message):
344
        """Mark a single step of regeneration complete."""
345
        self.pb.update(message, self.count, self.total)
346
        self.count += 1
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
347
348
349
class KnitReconciler(RepoReconciler):
350
    """Reconciler that reconciles a knit format repository.
351
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
352
    This will detect garbage inventories and remove them in thorough mode.
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
353
    """
354
355
    def _reconcile_steps(self):
356
        """Perform the steps to reconcile this repository."""
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
357
        if self.thorough:
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
358
            try:
359
                self._load_indexes()
360
            except errors.BzrCheckError:
361
                self.aborted = True
362
                return
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
363
            # knits never suffer this
364
            self._gc_inventory()
2745.6.13 by Aaron Bentley
Misc cleanup
365
            self._fix_text_parents()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
366
367
    def _load_indexes(self):
368
        """Load indexes for the reconciliation."""
369
        self.transaction = self.repo.get_transaction()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
370
        self.pb.update(gettext('Reading indexes'), 0, 2)
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
371
        self.inventory = self.repo.inventories
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
372
        self.pb.update(gettext('Reading indexes'), 1, 2)
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
373
        self.repo._check_for_inconsistent_revision_parents()
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
374
        self.revisions = self.repo.revisions
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
375
        self.pb.update(gettext('Reading indexes'), 2, 2)
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
376
377
    def _gc_inventory(self):
378
        """Remove inventories that are not referenced from the revision store."""
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
379
        self.pb.update(gettext('Checking unused inventories'), 0, 1)
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
380
        self._check_garbage_inventories()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
381
        self.pb.update(gettext('Checking unused inventories'), 1, 3)
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
382
        if not self.garbage_inventories:
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
383
            ui.ui_factory.note(gettext('Inventory ok.'))
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
384
            return
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
385
        self.pb.update(gettext('Backing up inventory'), 0, 0)
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
386
        self.repo._backup_inventory()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
387
        ui.ui_factory.note(gettext('Backup Inventory created'))
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
388
        # asking for '' should never return a non-empty weave
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
389
        new_inventories = self.repo._temp_inventories()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
390
        # we have topological order of revisions and non ghost parents ready.
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
391
        graph = self.revisions.get_parent_map(self.revisions.keys())
4577.2.4 by Maarten Bosmans
Make shure the faster topo_sort function is used where appropriate
392
        revision_keys = topo_sort(graph)
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
393
        revision_ids = [key[-1] for key in revision_keys]
394
        self._setup_steps(len(revision_keys))
395
        stream = self._change_inv_parents(
3606.7.7 by John Arbash Meinel
Add tests for the fetching behavior.
396
            self.inventory.get_record_stream(revision_keys, 'unordered', True),
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
397
            graph.__getitem__,
398
            set(revision_keys))
399
        new_inventories.insert_record_stream(stream)
1616.1.1 by Martin Pool
[merge] robertc
400
        # if this worked, the set of new_inventory_vf.names should equal
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
401
        # the revisionds list
402
        if not(set(new_inventories.keys()) == set(revision_keys)):
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
403
            raise AssertionError()
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
404
        self.pb.update(gettext('Writing weave'))
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
405
        self.repo._activate_new_inventory()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
406
        self.inventory = None
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
407
        ui.ui_factory.note(gettext('Inventory regenerated.'))
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
408
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
409
    def _fix_text_parents(self):
2745.6.13 by Aaron Bentley
Misc cleanup
410
        """Fix bad versionedfile parent entries.
411
2745.6.16 by Aaron Bentley
Update from review
412
        It is possible for the parents entry in a versionedfile entry to be
2745.6.13 by Aaron Bentley
Misc cleanup
413
        inconsistent with the values in the revision and inventory.
414
415
        This method finds entries with such inconsistencies, corrects their
416
        parent lists, and replaces the versionedfile with a corrected version.
417
        """
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
418
        transaction = self.repo.get_transaction()
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
419
        versions = [key[-1] for key in self.revisions.keys()]
2927.2.2 by Andrew Bennetts
Only try to check versions that actually exist in the versioned file, and do a little more muttering.
420
        mutter('Prepopulating revision text cache with %d revisions',
421
                len(versions))
3036.1.3 by Robert Collins
Privatise VersionedFileChecker.
422
        vf_checker = self.repo._get_versioned_file_checker()
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
423
        bad_parents, unused_versions = vf_checker.check_file_version_parents(
424
            self.repo.texts, self.pb)
425
        text_index = vf_checker.text_index
426
        per_id_bad_parents = {}
427
        for key in unused_versions:
428
            # Ensure that every file with unused versions gets rewritten.
429
            # NB: This is really not needed, reconcile != pack.
430
            per_id_bad_parents[key[0]] = {}
431
        # Generate per-knit/weave data.
432
        for key, details in bad_parents.iteritems():
433
            file_id = key[0]
434
            rev_id = key[1]
435
            knit_parents = tuple([parent[-1] for parent in details[0]])
436
            correct_parents = tuple([parent[-1] for parent in details[1]])
437
            file_details = per_id_bad_parents.setdefault(file_id, {})
438
            file_details[rev_id] = (knit_parents, correct_parents)
439
        file_id_versions = {}
440
        for text_key in text_index:
441
            versions_list = file_id_versions.setdefault(text_key[0], [])
442
            versions_list.append(text_key[1])
443
        # Do the reconcile of individual weaves.
444
        for num, file_id in enumerate(per_id_bad_parents):
6138.3.4 by Jonathan Riddell
add gettext() to uses of trace.note()
445
            self.pb.update(gettext('Fixing text parents'), num,
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
446
                           len(per_id_bad_parents))
447
            versions_with_bad_parents = per_id_bad_parents[file_id]
448
            id_unused_versions = set(key[-1] for key in unused_versions
449
                if key[0] == file_id)
450
            if file_id in file_id_versions:
451
                file_versions = file_id_versions[file_id]
452
            else:
453
                # This id was present in the disk store but is not referenced
454
                # by any revision at all.
455
                file_versions = []
456
            self._fix_text_parent(file_id, versions_with_bad_parents,
457
                 id_unused_versions, file_versions)
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
458
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
459
    def _fix_text_parent(self, file_id, versions_with_bad_parents,
460
            unused_versions, all_versions):
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
461
        """Fix bad versionedfile entries in a single versioned file."""
2927.2.2 by Andrew Bennetts
Only try to check versions that actually exist in the versioned file, and do a little more muttering.
462
        mutter('fixing text parent: %r (%d versions)', file_id,
463
                len(versions_with_bad_parents))
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
464
        mutter('(%d are unused)', len(unused_versions))
465
        new_file_id = 'temp:%s' % file_id
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
466
        new_parents = {}
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
467
        needed_keys = set()
468
        for version in all_versions:
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
469
            if version in unused_versions:
470
                continue
471
            elif version in versions_with_bad_parents:
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
472
                parents = versions_with_bad_parents[version][1]
473
            else:
3350.6.4 by Robert Collins
First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.
474
                pmap = self.repo.texts.get_parent_map([(file_id, version)])
475
                parents = [key[-1] for key in pmap[(file_id, version)]]
476
            new_parents[(new_file_id, version)] = [
477
                (new_file_id, parent) for parent in parents]
478
            needed_keys.add((file_id, version))
479
        def fix_parents(stream):
480
            for record in stream:
481
                bytes = record.get_bytes_as('fulltext')
482
                new_key = (new_file_id, record.key[-1])
483
                parents = new_parents[new_key]
484
                yield FulltextContentFactory(new_key, parents, record.sha1, bytes)
485
        stream = self.repo.texts.get_record_stream(needed_keys, 'topological', True)
486
        self.repo._remove_file_id(new_file_id)
487
        self.repo.texts.insert_record_stream(fix_parents(stream))
488
        self.repo._remove_file_id(file_id)
489
        if len(new_parents):
490
            self.repo._move_file_id(new_file_id, file_id)
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
491
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
492
493
class PackReconciler(RepoReconciler):
494
    """Reconciler that reconciles a pack based repository.
495
496
    Garbage inventories do not affect ancestry queries, and removal is
497
    considerably more expensive as there is no separate versioned file for
498
    them, so they are not cleaned. In short it is currently a no-op.
499
500
    In future this may be a good place to hook in annotation cache checking,
501
    index recreation etc.
502
    """
503
2592.3.239 by Martin Pool
doc
504
    # XXX: The index corruption that _fix_text_parents performs is needed for
505
    # packs, but not yet implemented. The basic approach is to:
506
    #  - lock the names list
507
    #  - perform a customised pack() that regenerates data as needed
508
    #  - unlock the names list
5243.1.2 by Martin
Point launchpad links in comments at production server rather than edge
509
    # https://bugs.launchpad.net/bzr/+bug/154173
2592.3.239 by Martin Pool
doc
510
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
511
    def __init__(self, repo, other=None, thorough=False,
512
            canonicalize_chks=False):
513
        super(PackReconciler, self).__init__(repo, other=other,
514
            thorough=thorough)
515
        self.canonicalize_chks = canonicalize_chks
516
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
517
    def _reconcile_steps(self):
518
        """Perform the steps to reconcile this repository."""
2951.1.2 by Robert Collins
Partial refactoring of pack_repo to create a Packer object for packing.
519
        if not self.thorough:
520
            return
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
521
        collection = self.repo._pack_collection
522
        collection.ensure_loaded()
523
        collection.lock_names()
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
524
        self.add_cleanup(collection._unlock_names)
525
        packs = collection.all_packs()
526
        all_revisions = self.repo.all_revision_ids()
527
        total_inventories = len(list(
528
            collection.inventory_index.combined_index.iter_all_entries()))
529
        if len(all_revisions):
5375.1.3 by Andrew Bennetts
Add hidden --canonicalize-chks option to reconcile to trigger GCCHKCanonicalizingPacker, improve progress reporting a little.
530
            if self.canonicalize_chks:
531
                reconcile_meth = self.repo._canonicalize_chks_pack
532
            else:
533
                reconcile_meth = self.repo._reconcile_pack
534
            new_pack = reconcile_meth(collection, packs, ".reconcile",
535
                all_revisions, self.pb)
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
536
            if new_pack is not None:
2951.1.10 by Robert Collins
Peer review feedback with Ian.
537
                self._discard_and_save(packs)
4936.1.1 by Andrew Bennetts
Replace some fragile try/finally cleanups in bzrlib.reconcile with OperationWithCleanups (borrowing run_simple from command-cleanup branch).
538
        else:
539
            # only make a new pack when there is data to copy.
540
            self._discard_and_save(packs)
541
        self.garbage_inventories = total_inventories - len(list(
542
            collection.inventory_index.combined_index.iter_all_entries()))
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
543
2951.1.10 by Robert Collins
Peer review feedback with Ian.
544
    def _discard_and_save(self, packs):
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
545
        """Discard some packs from the repository.
546
2951.1.10 by Robert Collins
Peer review feedback with Ian.
547
        This removes them from the memory index, saves the in-memory index
548
        which makes the newly reconciled pack visible and hides the packs to be
549
        discarded, and finally renames the packs being discarded into the
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
550
        obsolete packs directory.
2951.1.10 by Robert Collins
Peer review feedback with Ian.
551
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
552
        :param packs: The packs to discard.
553
        """
554
        for pack in packs:
555
            self.repo._pack_collection._remove_pack_from_memory(pack)
556
        self.repo._pack_collection._save_pack_names()
557
        self.repo._pack_collection._obsolete_packs(packs)