1
# Copyright (C) 2006-2011 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
25
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
28
WHOLE_NUMBER = {digit}, digit;
30
REVISION_ID = a non-empty utf8 string;
32
dirstate format = header line, full checksum, row count, parent details,
33
ghost_details, entries;
34
header line = "#bazaar dirstate flat format 3", NL;
35
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
36
row count = "num_entries: ", WHOLE_NUMBER, NL;
37
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
38
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
40
entry = entry_key, current_entry_details, {parent_entry_details};
41
entry_key = dirname, basename, fileid;
42
current_entry_details = common_entry_details, working_entry_details;
43
parent_entry_details = common_entry_details, history_entry_details;
44
common_entry_details = MINIKIND, fingerprint, size, executable
45
working_entry_details = packed_stat
46
history_entry_details = REVISION_ID;
49
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
51
Given this definition, the following is useful to know::
53
entry (aka row) - all the data for a given key.
54
entry[0]: The key (dirname, basename, fileid)
58
entry[1]: The tree(s) data for this path and id combination.
59
entry[1][0]: The current tree
60
entry[1][1]: The second tree
62
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate::
64
entry[1][0][0]: minikind
65
entry[1][0][1]: fingerprint
67
entry[1][0][3]: executable
68
entry[1][0][4]: packed_stat
72
entry[1][1][4]: revision_id
74
There may be multiple rows at the root, one per id present in the root, so the
75
in memory root row is now::
77
self._dirblocks[0] -> ('', [entry ...]),
79
and the entries in there are::
83
entries[0][2]: file_id
84
entries[1][0]: The tree data for the current tree for this fileid at /
89
'r' is a relocated entry: This path is not present in this tree with this
90
id, but the id can be found at another location. The fingerprint is
91
used to point to the target location.
92
'a' is an absent entry: In that tree the id is not present at this path.
93
'd' is a directory entry: This path in this tree is a directory with the
94
current file id. There is no fingerprint for directories.
95
'f' is a file entry: As for directory, but it's a file. The fingerprint is
96
the sha1 value of the file's canonical form, i.e. after any read
97
filters have been applied to the convenience form stored in the working
99
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is
101
't' is a reference to a nested subtree; the fingerprint is the referenced
106
The entries on disk and in memory are ordered according to the following keys::
108
directory, as a list of components
112
--- Format 1 had the following different definition: ---
116
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
117
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
119
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
120
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
123
PARENT ROW's are emitted for every parent that is not in the ghosts details
124
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
125
each row will have a PARENT ROW for foo and baz, but not for bar.
128
In any tree, a kind of 'moved' indicates that the fingerprint field
129
(which we treat as opaque data specific to the 'kind' anyway) has the
130
details for the id of this row in that tree.
132
I'm strongly tempted to add a id->path index as well, but I think that
133
where we need id->path mapping; we also usually read the whole file, so
134
I'm going to skip that for the moment, as we have the ability to locate
135
via bisect any path in any tree, and if we lookup things by path, we can
136
accumulate an id->path mapping as we go, which will tend to match what we
139
I plan to implement this asap, so please speak up now to alter/tweak the
140
design - and once we stabilise on this, I'll update the wiki page for
143
The rationale for all this is that we want fast operations for the
144
common case (diff/status/commit/merge on all files) and extremely fast
145
operations for the less common but still occurs a lot status/diff/commit
146
on specific files). Operations on specific files involve a scan for all
147
the children of a path, *in every involved tree*, which the current
148
format did not accommodate.
152
1. Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
153
2. fall back current object model as needed.
154
3. scale usably to the largest trees known today - say 50K entries. (mozilla
155
is an example of this)
160
Eventually reuse dirstate objects across locks IFF the dirstate file has not
161
been modified, but will require that we flush/ignore cached stat-hit data
162
because we won't want to restat all files on disk just because a lock was
163
acquired, yet we cannot trust the data after the previous lock was released.
165
Memory representation::
167
vector of all directories, and vector of the childen ?
169
root_entrie = (direntry for root, [parent_direntries_for_root]),
171
('', ['data for achild', 'data for bchild', 'data for cchild'])
172
('dir', ['achild', 'cchild', 'echild'])
174
- single bisect to find N subtrees from a path spec
175
- in-order for serialisation - this is 'dirblock' grouping.
176
- insertion of a file '/a' affects only the '/' child-vector, that is, to
177
insert 10K elements from scratch does not generates O(N^2) memoves of a
178
single vector, rather each individual, which tends to be limited to a
179
manageable number. Will scale badly on trees with 10K entries in a
180
single directory. compare with Inventory.InventoryDirectory which has
181
a dictionary for the children. No bisect capability, can only probe for
182
exact matches, or grab all elements and sort.
183
- What's the risk of error here? Once we have the base format being processed
184
we should have a net win regardless of optimality. So we are going to
185
go with what seems reasonable.
189
Maybe we should do a test profile of the core structure - 10K simulated
190
searches/lookups/etc?
192
Objects for each row?
193
The lifetime of Dirstate objects is current per lock, but see above for
194
possible extensions. The lifetime of a row from a dirstate is expected to be
195
very short in the optimistic case: which we are optimising for. For instance,
196
subtree status will determine from analysis of the disk data what rows need to
197
be examined at all, and will be able to determine from a single row whether
198
that file has altered or not, so we are aiming to process tens of thousands of
199
entries each second within the dirstate context, before exposing anything to
200
the larger codebase. This suggests we want the time for a single file
201
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
202
processed, and to scale to 100 thousand we'll another order of magnitude to do
203
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
204
the file on disk, and then immediately discard, the overhead of object creation
205
becomes a significant cost.
207
Figures: Creating a tuple from 3 elements was profiled at 0.0625
208
microseconds, whereas creating a object which is subclassed from tuple was
209
0.500 microseconds, and creating an object with 3 elements and slots was 3
210
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
211
down to 10 microseconds for the total processing - having 33% of that be object
212
creation is a huge overhead. There is a potential cost in using tuples within
213
each row which is that the conditional code to do comparisons may be slower
214
than method invocation, but method invocation is known to be slow due to stack
215
frame creation, so avoiding methods in these tight inner loops in unfortunately
216
desirable. We can consider a pyrex version of this with objects in future if
225
from stat import S_IEXEC
245
# This is the Windows equivalent of ENOTDIR
246
# It is defined in pywin32.winerror, but we don't want a strong dependency for
247
# just an error code.
248
ERROR_PATH_NOT_FOUND = 3
249
ERROR_DIRECTORY = 267
252
class SHA1Provider(object):
253
"""An interface for getting sha1s of a file."""
255
def sha1(self, abspath):
256
"""Return the sha1 of a file given its absolute path.
258
:param abspath: May be a filesystem encoded absolute path
261
raise NotImplementedError(self.sha1)
263
def stat_and_sha1(self, abspath):
264
"""Return the stat and sha1 of a file given its absolute path.
266
:param abspath: May be a filesystem encoded absolute path
269
Note: the stat should be the stat of the physical file
270
while the sha may be the sha of its canonical content.
272
raise NotImplementedError(self.stat_and_sha1)
275
class DefaultSHA1Provider(SHA1Provider):
276
"""A SHA1Provider that reads directly from the filesystem."""
278
def sha1(self, abspath):
279
"""Return the sha1 of a file given its absolute path."""
280
return osutils.sha_file_by_name(abspath)
282
def stat_and_sha1(self, abspath):
283
"""Return the stat and sha1 of a file given its absolute path."""
284
file_obj = file(abspath, 'rb')
286
statvalue = os.fstat(file_obj.fileno())
287
sha1 = osutils.sha_file(file_obj)
290
return statvalue, sha1
293
class DirState(object):
294
"""Record directory and metadata state for fast access.
296
A dirstate is a specialised data structure for managing local working
297
tree state information. Its not yet well defined whether it is platform
298
specific, and if it is how we detect/parameterize that.
300
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
301
Unlike most bzr disk formats, DirStates must be locked for reading, using
302
lock_read. (This is an os file lock internally.) This is necessary
303
because the file can be rewritten in place.
305
DirStates must be explicitly written with save() to commit changes; just
306
unlocking them does not write the changes to disk.
309
_kind_to_minikind = {
315
'tree-reference': 't',
317
_minikind_to_kind = {
323
't': 'tree-reference',
325
_stat_to_minikind = {
330
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
331
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
334
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
335
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
336
# that we are sure a single record will always fit.
337
BISECT_PAGE_SIZE = 4096
340
IN_MEMORY_UNMODIFIED = 1
341
IN_MEMORY_MODIFIED = 2
342
IN_MEMORY_HASH_MODIFIED = 3 # Only hash-cache updates
344
# A pack_stat (the x's) that is just noise and will never match the output
347
NULL_PARENT_DETAILS = static_tuple.StaticTuple('a', '', 0, False, '')
349
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
350
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
352
def __init__(self, path, sha1_provider, worth_saving_limit=0):
353
"""Create a DirState object.
355
:param path: The path at which the dirstate file on disk should live.
356
:param sha1_provider: an object meeting the SHA1Provider interface.
357
:param worth_saving_limit: when the exact number of hash changed
358
entries is known, only bother saving the dirstate if more than
359
this count of entries have changed.
360
-1 means never save hash changes, 0 means always save hash changes.
362
# _header_state and _dirblock_state represent the current state
363
# of the dirstate metadata and the per-row data respectiely.
364
# NOT_IN_MEMORY indicates that no data is in memory
365
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
366
# is the same as is on disk
367
# IN_MEMORY_MODIFIED indicates that we have a modified version
368
# of what is on disk.
369
# In future we will add more granularity, for instance _dirblock_state
370
# will probably support partially-in-memory as a separate variable,
371
# allowing for partially-in-memory unmodified and partially-in-memory
373
self._header_state = DirState.NOT_IN_MEMORY
374
self._dirblock_state = DirState.NOT_IN_MEMORY
375
# If true, an error has been detected while updating the dirstate, and
376
# for safety we're not going to commit to disk.
377
self._changes_aborted = False
381
self._state_file = None
382
self._filename = path
383
self._lock_token = None
384
self._lock_state = None
385
self._id_index = None
386
# a map from packed_stat to sha's.
387
self._packed_stat_index = None
388
self._end_of_header = None
389
self._cutoff_time = None
390
self._split_path_cache = {}
391
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
392
self._sha1_provider = sha1_provider
393
if 'hashcache' in debug.debug_flags:
394
self._sha1_file = self._sha1_file_and_mutter
396
self._sha1_file = self._sha1_provider.sha1
397
# These two attributes provide a simple cache for lookups into the
398
# dirstate in-memory vectors. By probing respectively for the last
399
# block, and for the next entry, we save nearly 2 bisections per path
401
self._last_block_index = None
402
self._last_entry_index = None
403
# The set of known hash changes
404
self._known_hash_changes = set()
405
# How many hash changed entries can we have without saving
406
self._worth_saving_limit = worth_saving_limit
407
self._config_stack = config.LocationStack(urlutils.local_path_to_url(
412
(self.__class__.__name__, self._filename)
414
def _mark_modified(self, hash_changed_entries=None, header_modified=False):
415
"""Mark this dirstate as modified.
417
:param hash_changed_entries: if non-None, mark just these entries as
418
having their hash modified.
419
:param header_modified: mark the header modified as well, not just the
422
#trace.mutter_callsite(3, "modified hash entries: %s", hash_changed_entries)
423
if hash_changed_entries:
424
self._known_hash_changes.update([e[0] for e in hash_changed_entries])
425
if self._dirblock_state in (DirState.NOT_IN_MEMORY,
426
DirState.IN_MEMORY_UNMODIFIED):
427
# If the dirstate is already marked a IN_MEMORY_MODIFIED, then
428
# that takes precedence.
429
self._dirblock_state = DirState.IN_MEMORY_HASH_MODIFIED
431
# TODO: Since we now have a IN_MEMORY_HASH_MODIFIED state, we
432
# should fail noisily if someone tries to set
433
# IN_MEMORY_MODIFIED but we don't have a write-lock!
434
# We don't know exactly what changed so disable smart saving
435
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
437
self._header_state = DirState.IN_MEMORY_MODIFIED
439
def _mark_unmodified(self):
440
"""Mark this dirstate as unmodified."""
441
self._header_state = DirState.IN_MEMORY_UNMODIFIED
442
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
443
self._known_hash_changes = set()
445
def add(self, path, file_id, kind, stat, fingerprint):
446
"""Add a path to be tracked.
448
:param path: The path within the dirstate - '' is the root, 'foo' is the
449
path foo within the root, 'foo/bar' is the path bar within foo
451
:param file_id: The file id of the path being added.
452
:param kind: The kind of the path, as a string like 'file',
454
:param stat: The output of os.lstat for the path.
455
:param fingerprint: The sha value of the file's canonical form (i.e.
456
after any read filters have been applied),
457
or the target of a symlink,
458
or the referenced revision id for tree-references,
459
or '' for directories.
462
# find the block its in.
463
# find the location in the block.
464
# check its not there
466
#------- copied from inventory.ensure_normalized_name - keep synced.
467
# --- normalized_filename wants a unicode basename only, so get one.
468
dirname, basename = osutils.split(path)
469
# we dont import normalized_filename directly because we want to be
470
# able to change the implementation at runtime for tests.
471
norm_name, can_access = osutils.normalized_filename(basename)
472
if norm_name != basename:
476
raise errors.InvalidNormalization(path)
477
# you should never have files called . or ..; just add the directory
478
# in the parent, or according to the special treatment for the root
479
if basename == '.' or basename == '..':
480
raise errors.InvalidEntryName(path)
481
# now that we've normalised, we need the correct utf8 path and
482
# dirname and basename elements. This single encode and split should be
483
# faster than three separate encodes.
484
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
485
dirname, basename = osutils.split(utf8path)
486
# uses __class__ for speed; the check is needed for safety
487
if file_id.__class__ is not str:
488
raise AssertionError(
489
"must be a utf8 file_id not %s" % (type(file_id), ))
490
# Make sure the file_id does not exist in this tree
492
file_id_entry = self._get_entry(0, fileid_utf8=file_id, include_deleted=True)
493
if file_id_entry != (None, None):
494
if file_id_entry[1][0][0] == 'a':
495
if file_id_entry[0] != (dirname, basename, file_id):
496
# set the old name's current operation to rename
497
self.update_minimal(file_id_entry[0],
503
rename_from = file_id_entry[0][0:2]
505
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
506
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
507
info = '%s:%s' % (kind, path)
508
raise errors.DuplicateFileId(file_id, info)
509
first_key = (dirname, basename, '')
510
block_index, present = self._find_block_index_from_key(first_key)
512
# check the path is not in the tree
513
block = self._dirblocks[block_index][1]
514
entry_index, _ = self._find_entry_index(first_key, block)
515
while (entry_index < len(block) and
516
block[entry_index][0][0:2] == first_key[0:2]):
517
if block[entry_index][1][0][0] not in 'ar':
518
# this path is in the dirstate in the current tree.
519
raise Exception, "adding already added path!"
522
# The block where we want to put the file is not present. But it
523
# might be because the directory was empty, or not loaded yet. Look
524
# for a parent entry, if not found, raise NotVersionedError
525
parent_dir, parent_base = osutils.split(dirname)
526
parent_block_idx, parent_entry_idx, _, parent_present = \
527
self._get_block_entry_index(parent_dir, parent_base, 0)
528
if not parent_present:
529
raise errors.NotVersionedError(path, str(self))
530
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
531
block = self._dirblocks[block_index][1]
532
entry_key = (dirname, basename, file_id)
535
packed_stat = DirState.NULLSTAT
538
packed_stat = pack_stat(stat)
539
parent_info = self._empty_parent_info()
540
minikind = DirState._kind_to_minikind[kind]
541
if rename_from is not None:
543
old_path_utf8 = '%s/%s' % rename_from
545
old_path_utf8 = rename_from[1]
546
parent_info[0] = ('r', old_path_utf8, 0, False, '')
548
entry_data = entry_key, [
549
(minikind, fingerprint, size, False, packed_stat),
551
elif kind == 'directory':
552
entry_data = entry_key, [
553
(minikind, '', 0, False, packed_stat),
555
elif kind == 'symlink':
556
entry_data = entry_key, [
557
(minikind, fingerprint, size, False, packed_stat),
559
elif kind == 'tree-reference':
560
entry_data = entry_key, [
561
(minikind, fingerprint, 0, False, packed_stat),
564
raise errors.BzrError('unknown kind %r' % kind)
565
entry_index, present = self._find_entry_index(entry_key, block)
567
block.insert(entry_index, entry_data)
569
if block[entry_index][1][0][0] != 'a':
570
raise AssertionError(" %r(%r) already added" % (basename, file_id))
571
block[entry_index][1][0] = entry_data[1][0]
573
if kind == 'directory':
574
# insert a new dirblock
575
self._ensure_block(block_index, entry_index, utf8path)
576
self._mark_modified()
578
self._add_to_id_index(self._id_index, entry_key)
580
def _bisect(self, paths):
581
"""Bisect through the disk structure for specific rows.
583
:param paths: A list of paths to find
584
:return: A dict mapping path => entries for found entries. Missing
585
entries will not be in the map.
586
The list is not sorted, and entries will be populated
587
based on when they were read.
589
self._requires_lock()
590
# We need the file pointer to be right after the initial header block
591
self._read_header_if_needed()
592
# If _dirblock_state was in memory, we should just return info from
593
# there, this function is only meant to handle when we want to read
595
if self._dirblock_state != DirState.NOT_IN_MEMORY:
596
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
598
# The disk representation is generally info + '\0\n\0' at the end. But
599
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
600
# Because it means we can sync on the '\n'
601
state_file = self._state_file
602
file_size = os.fstat(state_file.fileno()).st_size
603
# We end up with 2 extra fields, we should have a trailing '\n' to
604
# ensure that we read the whole record, and we should have a precursur
605
# '' which ensures that we start after the previous '\n'
606
entry_field_count = self._fields_per_entry() + 1
608
low = self._end_of_header
609
high = file_size - 1 # Ignore the final '\0'
610
# Map from (dir, name) => entry
613
# Avoid infinite seeking
614
max_count = 30*len(paths)
616
# pending is a list of places to look.
617
# each entry is a tuple of low, high, dir_names
618
# low -> the first byte offset to read (inclusive)
619
# high -> the last byte offset (inclusive)
620
# dir_names -> The list of (dir, name) pairs that should be found in
621
# the [low, high] range
622
pending = [(low, high, paths)]
624
page_size = self._bisect_page_size
626
fields_to_entry = self._get_fields_to_entry()
629
low, high, cur_files = pending.pop()
631
if not cur_files or low >= high:
636
if count > max_count:
637
raise errors.BzrError('Too many seeks, most likely a bug.')
639
mid = max(low, (low+high-page_size)/2)
642
# limit the read size, so we don't end up reading data that we have
644
read_size = min(page_size, (high-mid)+1)
645
block = state_file.read(read_size)
648
entries = block.split('\n')
651
# We didn't find a '\n', so we cannot have found any records.
652
# So put this range back and try again. But we know we have to
653
# increase the page size, because a single read did not contain
654
# a record break (so records must be larger than page_size)
656
pending.append((low, high, cur_files))
659
# Check the first and last entries, in case they are partial, or if
660
# we don't care about the rest of this page
662
first_fields = entries[0].split('\0')
663
if len(first_fields) < entry_field_count:
664
# We didn't get the complete first entry
665
# so move start, and grab the next, which
666
# should be a full entry
667
start += len(entries[0])+1
668
first_fields = entries[1].split('\0')
671
if len(first_fields) <= 2:
672
# We didn't even get a filename here... what do we do?
673
# Try a large page size and repeat this query
675
pending.append((low, high, cur_files))
678
# Find what entries we are looking for, which occur before and
679
# after this first record.
682
first_path = first_fields[1] + '/' + first_fields[2]
684
first_path = first_fields[2]
685
first_loc = _bisect_path_left(cur_files, first_path)
687
# These exist before the current location
688
pre = cur_files[:first_loc]
689
# These occur after the current location, which may be in the
690
# data we read, or might be after the last entry
691
post = cur_files[first_loc:]
693
if post and len(first_fields) >= entry_field_count:
694
# We have files after the first entry
696
# Parse the last entry
697
last_entry_num = len(entries)-1
698
last_fields = entries[last_entry_num].split('\0')
699
if len(last_fields) < entry_field_count:
700
# The very last hunk was not complete,
701
# read the previous hunk
702
after = mid + len(block) - len(entries[-1])
704
last_fields = entries[last_entry_num].split('\0')
706
after = mid + len(block)
709
last_path = last_fields[1] + '/' + last_fields[2]
711
last_path = last_fields[2]
712
last_loc = _bisect_path_right(post, last_path)
714
middle_files = post[:last_loc]
715
post = post[last_loc:]
718
# We have files that should occur in this block
719
# (>= first, <= last)
720
# Either we will find them here, or we can mark them as
723
if middle_files[0] == first_path:
724
# We might need to go before this location
725
pre.append(first_path)
726
if middle_files[-1] == last_path:
727
post.insert(0, last_path)
729
# Find out what paths we have
730
paths = {first_path:[first_fields]}
731
# last_path might == first_path so we need to be
732
# careful if we should append rather than overwrite
733
if last_entry_num != first_entry_num:
734
paths.setdefault(last_path, []).append(last_fields)
735
for num in xrange(first_entry_num+1, last_entry_num):
736
# TODO: jam 20070223 We are already splitting here, so
737
# shouldn't we just split the whole thing rather
738
# than doing the split again in add_one_record?
739
fields = entries[num].split('\0')
741
path = fields[1] + '/' + fields[2]
744
paths.setdefault(path, []).append(fields)
746
for path in middle_files:
747
for fields in paths.get(path, []):
748
# offset by 1 because of the opening '\0'
749
# consider changing fields_to_entry to avoid the
751
entry = fields_to_entry(fields[1:])
752
found.setdefault(path, []).append(entry)
754
# Now we have split up everything into pre, middle, and post, and
755
# we have handled everything that fell in 'middle'.
756
# We add 'post' first, so that we prefer to seek towards the
757
# beginning, so that we will tend to go as early as we need, and
758
# then only seek forward after that.
760
pending.append((after, high, post))
762
pending.append((low, start-1, pre))
764
# Consider that we may want to return the directory entries in sorted
765
# order. For now, we just return them in whatever order we found them,
766
# and leave it up to the caller if they care if it is ordered or not.
769
def _bisect_dirblocks(self, dir_list):
770
"""Bisect through the disk structure to find entries in given dirs.
772
_bisect_dirblocks is meant to find the contents of directories, which
773
differs from _bisect, which only finds individual entries.
775
:param dir_list: A sorted list of directory names ['', 'dir', 'foo'].
776
:return: A map from dir => entries_for_dir
778
# TODO: jam 20070223 A lot of the bisecting logic could be shared
779
# between this and _bisect. It would require parameterizing the
780
# inner loop with a function, though. We should evaluate the
781
# performance difference.
782
self._requires_lock()
783
# We need the file pointer to be right after the initial header block
784
self._read_header_if_needed()
785
# If _dirblock_state was in memory, we should just return info from
786
# there, this function is only meant to handle when we want to read
788
if self._dirblock_state != DirState.NOT_IN_MEMORY:
789
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
790
# The disk representation is generally info + '\0\n\0' at the end. But
791
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
792
# Because it means we can sync on the '\n'
793
state_file = self._state_file
794
file_size = os.fstat(state_file.fileno()).st_size
795
# We end up with 2 extra fields, we should have a trailing '\n' to
796
# ensure that we read the whole record, and we should have a precursur
797
# '' which ensures that we start after the previous '\n'
798
entry_field_count = self._fields_per_entry() + 1
800
low = self._end_of_header
801
high = file_size - 1 # Ignore the final '\0'
802
# Map from dir => entry
805
# Avoid infinite seeking
806
max_count = 30*len(dir_list)
808
# pending is a list of places to look.
809
# each entry is a tuple of low, high, dir_names
810
# low -> the first byte offset to read (inclusive)
811
# high -> the last byte offset (inclusive)
812
# dirs -> The list of directories that should be found in
813
# the [low, high] range
814
pending = [(low, high, dir_list)]
816
page_size = self._bisect_page_size
818
fields_to_entry = self._get_fields_to_entry()
821
low, high, cur_dirs = pending.pop()
823
if not cur_dirs or low >= high:
828
if count > max_count:
829
raise errors.BzrError('Too many seeks, most likely a bug.')
831
mid = max(low, (low+high-page_size)/2)
834
# limit the read size, so we don't end up reading data that we have
836
read_size = min(page_size, (high-mid)+1)
837
block = state_file.read(read_size)
840
entries = block.split('\n')
843
# We didn't find a '\n', so we cannot have found any records.
844
# So put this range back and try again. But we know we have to
845
# increase the page size, because a single read did not contain
846
# a record break (so records must be larger than page_size)
848
pending.append((low, high, cur_dirs))
851
# Check the first and last entries, in case they are partial, or if
852
# we don't care about the rest of this page
854
first_fields = entries[0].split('\0')
855
if len(first_fields) < entry_field_count:
856
# We didn't get the complete first entry
857
# so move start, and grab the next, which
858
# should be a full entry
859
start += len(entries[0])+1
860
first_fields = entries[1].split('\0')
863
if len(first_fields) <= 1:
864
# We didn't even get a dirname here... what do we do?
865
# Try a large page size and repeat this query
867
pending.append((low, high, cur_dirs))
870
# Find what entries we are looking for, which occur before and
871
# after this first record.
873
first_dir = first_fields[1]
874
first_loc = bisect.bisect_left(cur_dirs, first_dir)
876
# These exist before the current location
877
pre = cur_dirs[:first_loc]
878
# These occur after the current location, which may be in the
879
# data we read, or might be after the last entry
880
post = cur_dirs[first_loc:]
882
if post and len(first_fields) >= entry_field_count:
883
# We have records to look at after the first entry
885
# Parse the last entry
886
last_entry_num = len(entries)-1
887
last_fields = entries[last_entry_num].split('\0')
888
if len(last_fields) < entry_field_count:
889
# The very last hunk was not complete,
890
# read the previous hunk
891
after = mid + len(block) - len(entries[-1])
893
last_fields = entries[last_entry_num].split('\0')
895
after = mid + len(block)
897
last_dir = last_fields[1]
898
last_loc = bisect.bisect_right(post, last_dir)
900
middle_files = post[:last_loc]
901
post = post[last_loc:]
904
# We have files that should occur in this block
905
# (>= first, <= last)
906
# Either we will find them here, or we can mark them as
909
if middle_files[0] == first_dir:
910
# We might need to go before this location
911
pre.append(first_dir)
912
if middle_files[-1] == last_dir:
913
post.insert(0, last_dir)
915
# Find out what paths we have
916
paths = {first_dir:[first_fields]}
917
# last_dir might == first_dir so we need to be
918
# careful if we should append rather than overwrite
919
if last_entry_num != first_entry_num:
920
paths.setdefault(last_dir, []).append(last_fields)
921
for num in xrange(first_entry_num+1, last_entry_num):
922
# TODO: jam 20070223 We are already splitting here, so
923
# shouldn't we just split the whole thing rather
924
# than doing the split again in add_one_record?
925
fields = entries[num].split('\0')
926
paths.setdefault(fields[1], []).append(fields)
928
for cur_dir in middle_files:
929
for fields in paths.get(cur_dir, []):
930
# offset by 1 because of the opening '\0'
931
# consider changing fields_to_entry to avoid the
933
entry = fields_to_entry(fields[1:])
934
found.setdefault(cur_dir, []).append(entry)
936
# Now we have split up everything into pre, middle, and post, and
937
# we have handled everything that fell in 'middle'.
938
# We add 'post' first, so that we prefer to seek towards the
939
# beginning, so that we will tend to go as early as we need, and
940
# then only seek forward after that.
942
pending.append((after, high, post))
944
pending.append((low, start-1, pre))
948
def _bisect_recursive(self, paths):
949
"""Bisect for entries for all paths and their children.
951
This will use bisect to find all records for the supplied paths. It
952
will then continue to bisect for any records which are marked as
953
directories. (and renames?)
955
:param paths: A sorted list of (dir, name) pairs
956
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
957
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
959
# Map from (dir, name, file_id) => [tree_info]
962
found_dir_names = set()
964
# Directories that have been read
965
processed_dirs = set()
966
# Get the ball rolling with the first bisect for all entries.
967
newly_found = self._bisect(paths)
970
# Directories that need to be read
972
paths_to_search = set()
973
for entry_list in newly_found.itervalues():
974
for dir_name_id, trees_info in entry_list:
975
found[dir_name_id] = trees_info
976
found_dir_names.add(dir_name_id[:2])
978
for tree_info in trees_info:
979
minikind = tree_info[0]
982
# We already processed this one as a directory,
983
# we don't need to do the extra work again.
985
subdir, name, file_id = dir_name_id
986
path = osutils.pathjoin(subdir, name)
988
if path not in processed_dirs:
989
pending_dirs.add(path)
990
elif minikind == 'r':
991
# Rename, we need to directly search the target
992
# which is contained in the fingerprint column
993
dir_name = osutils.split(tree_info[1])
994
if dir_name[0] in pending_dirs:
995
# This entry will be found in the dir search
997
if dir_name not in found_dir_names:
998
paths_to_search.add(tree_info[1])
999
# Now we have a list of paths to look for directly, and
1000
# directory blocks that need to be read.
1001
# newly_found is mixing the keys between (dir, name) and path
1002
# entries, but that is okay, because we only really care about the
1004
newly_found = self._bisect(sorted(paths_to_search))
1005
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
1006
processed_dirs.update(pending_dirs)
1009
def _discard_merge_parents(self):
1010
"""Discard any parents trees beyond the first.
1012
Note that if this fails the dirstate is corrupted.
1014
After this function returns the dirstate contains 2 trees, neither of
1017
self._read_header_if_needed()
1018
parents = self.get_parent_ids()
1019
if len(parents) < 1:
1021
# only require all dirblocks if we are doing a full-pass removal.
1022
self._read_dirblocks_if_needed()
1023
dead_patterns = set([('a', 'r'), ('a', 'a'), ('r', 'r'), ('r', 'a')])
1024
def iter_entries_removable():
1025
for block in self._dirblocks:
1026
deleted_positions = []
1027
for pos, entry in enumerate(block[1]):
1029
if (entry[1][0][0], entry[1][1][0]) in dead_patterns:
1030
deleted_positions.append(pos)
1031
if deleted_positions:
1032
if len(deleted_positions) == len(block[1]):
1035
for pos in reversed(deleted_positions):
1037
# if the first parent is a ghost:
1038
if parents[0] in self.get_ghosts():
1039
empty_parent = [DirState.NULL_PARENT_DETAILS]
1040
for entry in iter_entries_removable():
1041
entry[1][1:] = empty_parent
1043
for entry in iter_entries_removable():
1047
self._parents = [parents[0]]
1048
self._mark_modified(header_modified=True)
1050
def _empty_parent_info(self):
1051
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
1054
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
1055
"""Ensure a block for dirname exists.
1057
This function exists to let callers which know that there is a
1058
directory dirname ensure that the block for it exists. This block can
1059
fail to exist because of demand loading, or because a directory had no
1060
children. In either case it is not an error. It is however an error to
1061
call this if there is no parent entry for the directory, and thus the
1062
function requires the coordinates of such an entry to be provided.
1064
The root row is special cased and can be indicated with a parent block
1067
:param parent_block_index: The index of the block in which dirname's row
1069
:param parent_row_index: The index in the parent block where the row
1071
:param dirname: The utf8 dirname to ensure there is a block for.
1072
:return: The index for the block.
1074
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
1075
# This is the signature of the root row, and the
1076
# contents-of-root row is always index 1
1078
# the basename of the directory must be the end of its full name.
1079
if not (parent_block_index == -1 and
1080
parent_block_index == -1 and dirname == ''):
1081
if not dirname.endswith(
1082
self._dirblocks[parent_block_index][1][parent_row_index][0][1]):
1083
raise AssertionError("bad dirname %r" % dirname)
1084
block_index, present = self._find_block_index_from_key((dirname, '', ''))
1086
## In future, when doing partial parsing, this should load and
1087
# populate the entire block.
1088
self._dirblocks.insert(block_index, (dirname, []))
1091
def _entries_to_current_state(self, new_entries):
1092
"""Load new_entries into self.dirblocks.
1094
Process new_entries into the current state object, making them the active
1095
state. The entries are grouped together by directory to form dirblocks.
1097
:param new_entries: A sorted list of entries. This function does not sort
1098
to prevent unneeded overhead when callers have a sorted list already.
1101
if new_entries[0][0][0:2] != ('', ''):
1102
raise AssertionError(
1103
"Missing root row %r" % (new_entries[0][0],))
1104
# The two blocks here are deliberate: the root block and the
1105
# contents-of-root block.
1106
self._dirblocks = [('', []), ('', [])]
1107
current_block = self._dirblocks[0][1]
1108
current_dirname = ''
1110
append_entry = current_block.append
1111
for entry in new_entries:
1112
if entry[0][0] != current_dirname:
1113
# new block - different dirname
1115
current_dirname = entry[0][0]
1116
self._dirblocks.append((current_dirname, current_block))
1117
append_entry = current_block.append
1118
# append the entry to the current block
1120
self._split_root_dirblock_into_contents()
1122
def _split_root_dirblock_into_contents(self):
1123
"""Split the root dirblocks into root and contents-of-root.
1125
After parsing by path, we end up with root entries and contents-of-root
1126
entries in the same block. This loop splits them out again.
1128
# The above loop leaves the "root block" entries mixed with the
1129
# "contents-of-root block". But we don't want an if check on
1130
# all entries, so instead we just fix it up here.
1131
if self._dirblocks[1] != ('', []):
1132
raise ValueError("bad dirblock start %r" % (self._dirblocks[1],))
1134
contents_of_root_block = []
1135
for entry in self._dirblocks[0][1]:
1136
if not entry[0][1]: # This is a root entry
1137
root_block.append(entry)
1139
contents_of_root_block.append(entry)
1140
self._dirblocks[0] = ('', root_block)
1141
self._dirblocks[1] = ('', contents_of_root_block)
1143
def _entries_for_path(self, path):
1144
"""Return a list with all the entries that match path for all ids."""
1145
dirname, basename = os.path.split(path)
1146
key = (dirname, basename, '')
1147
block_index, present = self._find_block_index_from_key(key)
1149
# the block which should contain path is absent.
1152
block = self._dirblocks[block_index][1]
1153
entry_index, _ = self._find_entry_index(key, block)
1154
# we may need to look at multiple entries at this path: walk while the specific_files match.
1155
while (entry_index < len(block) and
1156
block[entry_index][0][0:2] == key[0:2]):
1157
result.append(block[entry_index])
1161
def _entry_to_line(self, entry):
1162
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
1164
:param entry: An entry_tuple as defined in the module docstring.
1166
entire_entry = list(entry[0])
1167
for tree_number, tree_data in enumerate(entry[1]):
1168
# (minikind, fingerprint, size, executable, tree_specific_string)
1169
entire_entry.extend(tree_data)
1170
# 3 for the key, 5 for the fields per tree.
1171
tree_offset = 3 + tree_number * 5
1173
entire_entry[tree_offset + 0] = tree_data[0]
1175
entire_entry[tree_offset + 2] = str(tree_data[2])
1177
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
1178
return '\0'.join(entire_entry)
1180
def _fields_per_entry(self):
1181
"""How many null separated fields should be in each entry row.
1183
Each line now has an extra '\\n' field which is not used
1184
so we just skip over it
1187
3 fields for the key
1188
+ number of fields per tree_data (5) * tree count
1191
tree_count = 1 + self._num_present_parents()
1192
return 3 + 5 * tree_count + 1
1194
def _find_block(self, key, add_if_missing=False):
1195
"""Return the block that key should be present in.
1197
:param key: A dirstate entry key.
1198
:return: The block tuple.
1200
block_index, present = self._find_block_index_from_key(key)
1202
if not add_if_missing:
1203
# check to see if key is versioned itself - we might want to
1204
# add it anyway, because dirs with no entries dont get a
1205
# dirblock at parse time.
1206
# This is an uncommon branch to take: most dirs have children,
1207
# and most code works with versioned paths.
1208
parent_base, parent_name = osutils.split(key[0])
1209
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
1210
# some parent path has not been added - its an error to add
1212
raise errors.NotVersionedError(key[0:2], str(self))
1213
self._dirblocks.insert(block_index, (key[0], []))
1214
return self._dirblocks[block_index]
1216
def _find_block_index_from_key(self, key):
1217
"""Find the dirblock index for a key.
1219
:return: The block index, True if the block for the key is present.
1221
if key[0:2] == ('', ''):
1224
if (self._last_block_index is not None and
1225
self._dirblocks[self._last_block_index][0] == key[0]):
1226
return self._last_block_index, True
1229
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
1230
cache=self._split_path_cache)
1231
# _right returns one-past-where-key is so we have to subtract
1232
# one to use it. we use _right here because there are two
1233
# '' blocks - the root, and the contents of root
1234
# we always have a minimum of 2 in self._dirblocks: root and
1235
# root-contents, and for '', we get 2 back, so this is
1236
# simple and correct:
1237
present = (block_index < len(self._dirblocks) and
1238
self._dirblocks[block_index][0] == key[0])
1239
self._last_block_index = block_index
1240
# Reset the entry index cache to the beginning of the block.
1241
self._last_entry_index = -1
1242
return block_index, present
1244
def _find_entry_index(self, key, block):
1245
"""Find the entry index for a key in a block.
1247
:return: The entry index, True if the entry for the key is present.
1249
len_block = len(block)
1251
if self._last_entry_index is not None:
1253
entry_index = self._last_entry_index + 1
1254
# A hit is when the key is after the last slot, and before or
1255
# equal to the next slot.
1256
if ((entry_index > 0 and block[entry_index - 1][0] < key) and
1257
key <= block[entry_index][0]):
1258
self._last_entry_index = entry_index
1259
present = (block[entry_index][0] == key)
1260
return entry_index, present
1263
entry_index = bisect.bisect_left(block, (key, []))
1264
present = (entry_index < len_block and
1265
block[entry_index][0] == key)
1266
self._last_entry_index = entry_index
1267
return entry_index, present
1270
def from_tree(tree, dir_state_filename, sha1_provider=None):
1271
"""Create a dirstate from a bzr Tree.
1273
:param tree: The tree which should provide parent information and
1275
:param sha1_provider: an object meeting the SHA1Provider interface.
1276
If None, a DefaultSHA1Provider is used.
1277
:return: a DirState object which is currently locked for writing.
1278
(it was locked by DirState.initialize)
1280
result = DirState.initialize(dir_state_filename,
1281
sha1_provider=sha1_provider)
1285
parent_ids = tree.get_parent_ids()
1286
num_parents = len(parent_ids)
1288
for parent_id in parent_ids:
1289
parent_tree = tree.branch.repository.revision_tree(parent_id)
1290
parent_trees.append((parent_id, parent_tree))
1291
parent_tree.lock_read()
1292
result.set_parent_trees(parent_trees, [])
1293
result.set_state_from_inventory(tree.inventory)
1295
for revid, parent_tree in parent_trees:
1296
parent_tree.unlock()
1299
# The caller won't have a chance to unlock this, so make sure we
1305
def _check_delta_is_valid(self, delta):
1306
return list(inventory._check_delta_unique_ids(
1307
inventory._check_delta_unique_old_paths(
1308
inventory._check_delta_unique_new_paths(
1309
inventory._check_delta_ids_match_entry(
1310
inventory._check_delta_ids_are_valid(
1311
inventory._check_delta_new_path_entry_both_or_None(delta)))))))
1313
def update_by_delta(self, delta):
1314
"""Apply an inventory delta to the dirstate for tree 0
1316
This is the workhorse for apply_inventory_delta in dirstate based
1319
:param delta: An inventory delta. See Inventory.apply_delta for
1322
self._read_dirblocks_if_needed()
1323
encode = cache_utf8.encode
1326
# Accumulate parent references (path_utf8, id), to check for parentless
1327
# items or items placed under files/links/tree-references. We get
1328
# references from every item in the delta that is not a deletion and
1329
# is not itself the root.
1331
# Added ids must not be in the dirstate already. This set holds those
1334
# This loop transforms the delta to single atomic operations that can
1335
# be executed and validated.
1336
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1337
for old_path, new_path, file_id, inv_entry in delta:
1338
if (file_id in insertions) or (file_id in removals):
1339
self._raise_invalid(old_path or new_path, file_id,
1341
if old_path is not None:
1342
old_path = old_path.encode('utf-8')
1343
removals[file_id] = old_path
1345
new_ids.add(file_id)
1346
if new_path is not None:
1347
if inv_entry is None:
1348
self._raise_invalid(new_path, file_id,
1349
"new_path with no entry")
1350
new_path = new_path.encode('utf-8')
1351
dirname_utf8, basename = osutils.split(new_path)
1353
parents.add((dirname_utf8, inv_entry.parent_id))
1354
key = (dirname_utf8, basename, file_id)
1355
minikind = DirState._kind_to_minikind[inv_entry.kind]
1357
fingerprint = inv_entry.reference_revision or ''
1360
insertions[file_id] = (key, minikind, inv_entry.executable,
1361
fingerprint, new_path)
1362
# Transform moves into delete+add pairs
1363
if None not in (old_path, new_path):
1364
for child in self._iter_child_entries(0, old_path):
1365
if child[0][2] in insertions or child[0][2] in removals:
1367
child_dirname = child[0][0]
1368
child_basename = child[0][1]
1369
minikind = child[1][0][0]
1370
fingerprint = child[1][0][4]
1371
executable = child[1][0][3]
1372
old_child_path = osutils.pathjoin(child_dirname,
1374
removals[child[0][2]] = old_child_path
1375
child_suffix = child_dirname[len(old_path):]
1376
new_child_dirname = (new_path + child_suffix)
1377
key = (new_child_dirname, child_basename, child[0][2])
1378
new_child_path = osutils.pathjoin(new_child_dirname,
1380
insertions[child[0][2]] = (key, minikind, executable,
1381
fingerprint, new_child_path)
1382
self._check_delta_ids_absent(new_ids, delta, 0)
1384
self._apply_removals(removals.iteritems())
1385
self._apply_insertions(insertions.values())
1387
self._after_delta_check_parents(parents, 0)
1388
except errors.BzrError, e:
1389
self._changes_aborted = True
1390
if 'integrity error' not in str(e):
1392
# _get_entry raises BzrError when a request is inconsistent; we
1393
# want such errors to be shown as InconsistentDelta - and that
1394
# fits the behaviour we trigger.
1395
raise errors.InconsistentDeltaDelta(delta,
1396
"error from _get_entry. %s" % (e,))
1398
def _apply_removals(self, removals):
1399
for file_id, path in sorted(removals, reverse=True,
1400
key=operator.itemgetter(1)):
1401
dirname, basename = osutils.split(path)
1402
block_i, entry_i, d_present, f_present = \
1403
self._get_block_entry_index(dirname, basename, 0)
1405
entry = self._dirblocks[block_i][1][entry_i]
1407
self._raise_invalid(path, file_id,
1408
"Wrong path for old path.")
1409
if not f_present or entry[1][0][0] in 'ar':
1410
self._raise_invalid(path, file_id,
1411
"Wrong path for old path.")
1412
if file_id != entry[0][2]:
1413
self._raise_invalid(path, file_id,
1414
"Attempt to remove path has wrong id - found %r."
1416
self._make_absent(entry)
1417
# See if we have a malformed delta: deleting a directory must not
1418
# leave crud behind. This increases the number of bisects needed
1419
# substantially, but deletion or renames of large numbers of paths
1420
# is rare enough it shouldn't be an issue (famous last words?) RBC
1422
block_i, entry_i, d_present, f_present = \
1423
self._get_block_entry_index(path, '', 0)
1425
# The dir block is still present in the dirstate; this could
1426
# be due to it being in a parent tree, or a corrupt delta.
1427
for child_entry in self._dirblocks[block_i][1]:
1428
if child_entry[1][0][0] not in ('r', 'a'):
1429
self._raise_invalid(path, entry[0][2],
1430
"The file id was deleted but its children were "
1433
def _apply_insertions(self, adds):
1435
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1436
self.update_minimal(key, minikind, executable, fingerprint,
1437
path_utf8=path_utf8)
1438
except errors.NotVersionedError:
1439
self._raise_invalid(path_utf8.decode('utf8'), key[2],
1442
def update_basis_by_delta(self, delta, new_revid):
1443
"""Update the parents of this tree after a commit.
1445
This gives the tree one parent, with revision id new_revid. The
1446
inventory delta is applied to the current basis tree to generate the
1447
inventory for the parent new_revid, and all other parent trees are
1450
Note that an exception during the operation of this method will leave
1451
the dirstate in a corrupt state where it should not be saved.
1453
:param new_revid: The new revision id for the trees parent.
1454
:param delta: An inventory delta (see apply_inventory_delta) describing
1455
the changes from the current left most parent revision to new_revid.
1457
self._read_dirblocks_if_needed()
1458
self._discard_merge_parents()
1459
if self._ghosts != []:
1460
raise NotImplementedError(self.update_basis_by_delta)
1461
if len(self._parents) == 0:
1462
# setup a blank tree, the most simple way.
1463
empty_parent = DirState.NULL_PARENT_DETAILS
1464
for entry in self._iter_entries():
1465
entry[1].append(empty_parent)
1466
self._parents.append(new_revid)
1468
self._parents[0] = new_revid
1470
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1474
# The paths this function accepts are unicode and must be encoded as we
1476
encode = cache_utf8.encode
1477
inv_to_entry = self._inv_entry_to_details
1478
# delta is now (deletes, changes), (adds) in reverse lexographical
1480
# deletes in reverse lexographic order are safe to process in situ.
1481
# renames are not, as a rename from any path could go to a path
1482
# lexographically lower, so we transform renames into delete, add pairs,
1483
# expanding them recursively as needed.
1484
# At the same time, to reduce interface friction we convert the input
1485
# inventory entries to dirstate.
1486
root_only = ('', '')
1487
# Accumulate parent references (path_utf8, id), to check for parentless
1488
# items or items placed under files/links/tree-references. We get
1489
# references from every item in the delta that is not a deletion and
1490
# is not itself the root.
1492
# Added ids must not be in the dirstate already. This set holds those
1495
for old_path, new_path, file_id, inv_entry in delta:
1496
if inv_entry is not None and file_id != inv_entry.file_id:
1497
self._raise_invalid(new_path, file_id,
1498
"mismatched entry file_id %r" % inv_entry)
1499
if new_path is None:
1500
new_path_utf8 = None
1502
if inv_entry is None:
1503
self._raise_invalid(new_path, file_id,
1504
"new_path with no entry")
1505
new_path_utf8 = encode(new_path)
1506
# note the parent for validation
1507
dirname_utf8, basename_utf8 = osutils.split(new_path_utf8)
1509
parents.add((dirname_utf8, inv_entry.parent_id))
1510
if old_path is None:
1511
old_path_utf8 = None
1513
old_path_utf8 = encode(old_path)
1514
if old_path is None:
1515
adds.append((None, new_path_utf8, file_id,
1516
inv_to_entry(inv_entry), True))
1517
new_ids.add(file_id)
1518
elif new_path is None:
1519
deletes.append((old_path_utf8, None, file_id, None, True))
1520
elif (old_path, new_path) == root_only:
1521
# change things in-place
1522
# Note: the case of a parent directory changing its file_id
1523
# tends to break optimizations here, because officially
1524
# the file has actually been moved, it just happens to
1525
# end up at the same path. If we can figure out how to
1526
# handle that case, we can avoid a lot of add+delete
1527
# pairs for objects that stay put.
1528
# elif old_path == new_path:
1529
changes.append((old_path_utf8, new_path_utf8, file_id,
1530
inv_to_entry(inv_entry)))
1533
# Because renames must preserve their children we must have
1534
# processed all relocations and removes before hand. The sort
1535
# order ensures we've examined the child paths, but we also
1536
# have to execute the removals, or the split to an add/delete
1537
# pair will result in the deleted item being reinserted, or
1538
# renamed items being reinserted twice - and possibly at the
1539
# wrong place. Splitting into a delete/add pair also simplifies
1540
# the handling of entries with ('f', ...), ('r' ...) because
1541
# the target of the 'r' is old_path here, and we add that to
1542
# deletes, meaning that the add handler does not need to check
1543
# for 'r' items on every pass.
1544
self._update_basis_apply_deletes(deletes)
1546
# Split into an add/delete pair recursively.
1547
adds.append((old_path_utf8, new_path_utf8, file_id,
1548
inv_to_entry(inv_entry), False))
1549
# Expunge deletes that we've seen so that deleted/renamed
1550
# children of a rename directory are handled correctly.
1551
new_deletes = reversed(list(
1552
self._iter_child_entries(1, old_path_utf8)))
1553
# Remove the current contents of the tree at orig_path, and
1554
# reinsert at the correct new path.
1555
for entry in new_deletes:
1556
child_dirname, child_basename, child_file_id = entry[0]
1558
source_path = child_dirname + '/' + child_basename
1560
source_path = child_basename
1562
target_path = new_path_utf8 + source_path[len(old_path):]
1565
raise AssertionError("cannot rename directory to"
1567
target_path = source_path[len(old_path) + 1:]
1568
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1570
(source_path, target_path, entry[0][2], None, False))
1571
deletes.append((old_path_utf8, new_path, file_id, None, False))
1572
self._check_delta_ids_absent(new_ids, delta, 1)
1574
# Finish expunging deletes/first half of renames.
1575
self._update_basis_apply_deletes(deletes)
1576
# Reinstate second half of renames and new paths.
1577
self._update_basis_apply_adds(adds)
1578
# Apply in-situ changes.
1579
self._update_basis_apply_changes(changes)
1581
self._after_delta_check_parents(parents, 1)
1582
except errors.BzrError, e:
1583
self._changes_aborted = True
1584
if 'integrity error' not in str(e):
1586
# _get_entry raises BzrError when a request is inconsistent; we
1587
# want such errors to be shown as InconsistentDelta - and that
1588
# fits the behaviour we trigger.
1589
raise errors.InconsistentDeltaDelta(delta,
1590
"error from _get_entry. %s" % (e,))
1592
self._mark_modified(header_modified=True)
1593
self._id_index = None
1596
def _check_delta_ids_absent(self, new_ids, delta, tree_index):
1597
"""Check that none of the file_ids in new_ids are present in a tree."""
1600
id_index = self._get_id_index()
1601
for file_id in new_ids:
1602
for key in id_index.get(file_id, ()):
1603
block_i, entry_i, d_present, f_present = \
1604
self._get_block_entry_index(key[0], key[1], tree_index)
1606
# In a different tree
1608
entry = self._dirblocks[block_i][1][entry_i]
1609
if entry[0][2] != file_id:
1610
# Different file_id, so not what we want.
1612
self._raise_invalid(("%s/%s" % key[0:2]).decode('utf8'), file_id,
1613
"This file_id is new in the delta but already present in "
1616
def _raise_invalid(self, path, file_id, reason):
1617
self._changes_aborted = True
1618
raise errors.InconsistentDelta(path, file_id, reason)
1620
def _update_basis_apply_adds(self, adds):
1621
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1623
They may be adds, or renames that have been split into add/delete
1626
:param adds: A sequence of adds. Each add is a tuple:
1627
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1628
is False when the add is the second half of a remove-and-reinsert
1629
pair created to handle renames and deletes.
1631
# Adds are accumulated partly from renames, so can be in any input
1633
# TODO: we may want to sort in dirblocks order. That way each entry
1634
# will end up in the same directory, allowing the _get_entry
1635
# fast-path for looking up 2 items in the same dir work.
1636
adds.sort(key=lambda x: x[1])
1637
# adds is now in lexographic order, which places all parents before
1638
# their children, so we can process it linearly.
1640
st = static_tuple.StaticTuple
1641
for old_path, new_path, file_id, new_details, real_add in adds:
1642
dirname, basename = osutils.split(new_path)
1643
entry_key = st(dirname, basename, file_id)
1644
block_index, present = self._find_block_index_from_key(entry_key)
1646
self._raise_invalid(new_path, file_id,
1647
"Unable to find block for this record."
1648
" Was the parent added?")
1649
block = self._dirblocks[block_index][1]
1650
entry_index, present = self._find_entry_index(entry_key, block)
1652
if old_path is not None:
1653
self._raise_invalid(new_path, file_id,
1654
'considered a real add but still had old_path at %s'
1657
entry = block[entry_index]
1658
basis_kind = entry[1][1][0]
1659
if basis_kind == 'a':
1660
entry[1][1] = new_details
1661
elif basis_kind == 'r':
1662
raise NotImplementedError()
1664
self._raise_invalid(new_path, file_id,
1665
"An entry was marked as a new add"
1666
" but the basis target already existed")
1668
# The exact key was not found in the block. However, we need to
1669
# check if there is a key next to us that would have matched.
1670
# We only need to check 2 locations, because there are only 2
1672
for maybe_index in range(entry_index-1, entry_index+1):
1673
if maybe_index < 0 or maybe_index >= len(block):
1675
maybe_entry = block[maybe_index]
1676
if maybe_entry[0][:2] != (dirname, basename):
1677
# Just a random neighbor
1679
if maybe_entry[0][2] == file_id:
1680
raise AssertionError(
1681
'_find_entry_index didnt find a key match'
1682
' but walking the data did, for %s'
1684
basis_kind = maybe_entry[1][1][0]
1685
if basis_kind not in 'ar':
1686
self._raise_invalid(new_path, file_id,
1687
"we have an add record for path, but the path"
1688
" is already present with another file_id %s"
1689
% (maybe_entry[0][2],))
1691
entry = (entry_key, [DirState.NULL_PARENT_DETAILS,
1693
block.insert(entry_index, entry)
1695
active_kind = entry[1][0][0]
1696
if active_kind == 'a':
1697
# The active record shows up as absent, this could be genuine,
1698
# or it could be present at some other location. We need to
1700
id_index = self._get_id_index()
1701
# The id_index may not be perfectly accurate for tree1, because
1702
# we haven't been keeping it updated. However, it should be
1703
# fine for tree0, and that gives us enough info for what we
1705
keys = id_index.get(file_id, ())
1707
block_i, entry_i, d_present, f_present = \
1708
self._get_block_entry_index(key[0], key[1], 0)
1711
active_entry = self._dirblocks[block_i][1][entry_i]
1712
if (active_entry[0][2] != file_id):
1713
# Some other file is at this path, we don't need to
1716
real_active_kind = active_entry[1][0][0]
1717
if real_active_kind in 'ar':
1718
# We found a record, which was not *this* record,
1719
# which matches the file_id, but is not actually
1720
# present. Something seems *really* wrong.
1721
self._raise_invalid(new_path, file_id,
1722
"We found a tree0 entry that doesnt make sense")
1723
# Now, we've found a tree0 entry which matches the file_id
1724
# but is at a different location. So update them to be
1726
active_dir, active_name = active_entry[0][:2]
1728
active_path = active_dir + '/' + active_name
1730
active_path = active_name
1731
active_entry[1][1] = st('r', new_path, 0, False, '')
1732
entry[1][0] = st('r', active_path, 0, False, '')
1733
elif active_kind == 'r':
1734
raise NotImplementedError()
1736
new_kind = new_details[0]
1738
self._ensure_block(block_index, entry_index, new_path)
1740
def _update_basis_apply_changes(self, changes):
1741
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1743
:param adds: A sequence of changes. Each change is a tuple:
1744
(path_utf8, path_utf8, file_id, (entry_details))
1747
for old_path, new_path, file_id, new_details in changes:
1748
# the entry for this file_id must be in tree 0.
1749
entry = self._get_entry(1, file_id, new_path)
1750
if entry[0] is None or entry[1][1][0] in 'ar':
1751
self._raise_invalid(new_path, file_id,
1752
'changed entry considered not present')
1753
entry[1][1] = new_details
1755
def _update_basis_apply_deletes(self, deletes):
1756
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1758
They may be deletes, or renames that have been split into add/delete
1761
:param deletes: A sequence of deletes. Each delete is a tuple:
1762
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1763
real_delete is True when the desired outcome is an actual deletion
1764
rather than the rename handling logic temporarily deleting a path
1765
during the replacement of a parent.
1767
null = DirState.NULL_PARENT_DETAILS
1768
for old_path, new_path, file_id, _, real_delete in deletes:
1769
if real_delete != (new_path is None):
1770
self._raise_invalid(old_path, file_id, "bad delete delta")
1771
# the entry for this file_id must be in tree 1.
1772
dirname, basename = osutils.split(old_path)
1773
block_index, entry_index, dir_present, file_present = \
1774
self._get_block_entry_index(dirname, basename, 1)
1775
if not file_present:
1776
self._raise_invalid(old_path, file_id,
1777
'basis tree does not contain removed entry')
1778
entry = self._dirblocks[block_index][1][entry_index]
1779
# The state of the entry in the 'active' WT
1780
active_kind = entry[1][0][0]
1781
if entry[0][2] != file_id:
1782
self._raise_invalid(old_path, file_id,
1783
'mismatched file_id in tree 1')
1785
old_kind = entry[1][1][0]
1786
if active_kind in 'ar':
1787
# The active tree doesn't have this file_id.
1788
# The basis tree is changing this record. If this is a
1789
# rename, then we don't want the record here at all
1790
# anymore. If it is just an in-place change, we want the
1791
# record here, but we'll add it if we need to. So we just
1793
if active_kind == 'r':
1794
active_path = entry[1][0][1]
1795
active_entry = self._get_entry(0, file_id, active_path)
1796
if active_entry[1][1][0] != 'r':
1797
self._raise_invalid(old_path, file_id,
1798
"Dirstate did not have matching rename entries")
1799
elif active_entry[1][0][0] in 'ar':
1800
self._raise_invalid(old_path, file_id,
1801
"Dirstate had a rename pointing at an inactive"
1803
active_entry[1][1] = null
1804
del self._dirblocks[block_index][1][entry_index]
1806
# This was a directory, and the active tree says it
1807
# doesn't exist, and now the basis tree says it doesn't
1808
# exist. Remove its dirblock if present
1810
present) = self._find_block_index_from_key(
1813
dir_block = self._dirblocks[dir_block_index][1]
1815
# This entry is empty, go ahead and just remove it
1816
del self._dirblocks[dir_block_index]
1818
# There is still an active record, so just mark this
1821
block_i, entry_i, d_present, f_present = \
1822
self._get_block_entry_index(old_path, '', 1)
1824
dir_block = self._dirblocks[block_i][1]
1825
for child_entry in dir_block:
1826
child_basis_kind = child_entry[1][1][0]
1827
if child_basis_kind not in 'ar':
1828
self._raise_invalid(old_path, file_id,
1829
"The file id was deleted but its children were "
1832
def _after_delta_check_parents(self, parents, index):
1833
"""Check that parents required by the delta are all intact.
1835
:param parents: An iterable of (path_utf8, file_id) tuples which are
1836
required to be present in tree 'index' at path_utf8 with id file_id
1838
:param index: The column in the dirstate to check for parents in.
1840
for dirname_utf8, file_id in parents:
1841
# Get the entry - the ensures that file_id, dirname_utf8 exists and
1842
# has the right file id.
1843
entry = self._get_entry(index, file_id, dirname_utf8)
1844
if entry[1] is None:
1845
self._raise_invalid(dirname_utf8.decode('utf8'),
1846
file_id, "This parent is not present.")
1847
# Parents of things must be directories
1848
if entry[1][index][0] != 'd':
1849
self._raise_invalid(dirname_utf8.decode('utf8'),
1850
file_id, "This parent is not a directory.")
1852
def _observed_sha1(self, entry, sha1, stat_value,
1853
_stat_to_minikind=_stat_to_minikind):
1854
"""Note the sha1 of a file.
1856
:param entry: The entry the sha1 is for.
1857
:param sha1: The observed sha1.
1858
:param stat_value: The os.lstat for the file.
1861
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
1866
if self._cutoff_time is None:
1867
self._sha_cutoff_time()
1868
if (stat_value.st_mtime < self._cutoff_time
1869
and stat_value.st_ctime < self._cutoff_time):
1870
entry[1][0] = ('f', sha1, stat_value.st_size, entry[1][0][3],
1871
pack_stat(stat_value))
1872
self._mark_modified([entry])
1874
def _sha_cutoff_time(self):
1875
"""Return cutoff time.
1877
Files modified more recently than this time are at risk of being
1878
undetectably modified and so can't be cached.
1880
# Cache the cutoff time as long as we hold a lock.
1881
# time.time() isn't super expensive (approx 3.38us), but
1882
# when you call it 50,000 times it adds up.
1883
# For comparison, os.lstat() costs 7.2us if it is hot.
1884
self._cutoff_time = int(time.time()) - 3
1885
return self._cutoff_time
1887
def _lstat(self, abspath, entry):
1888
"""Return the os.lstat value for this path."""
1889
return os.lstat(abspath)
1891
def _sha1_file_and_mutter(self, abspath):
1892
# when -Dhashcache is turned on, this is monkey-patched in to log
1894
trace.mutter("dirstate sha1 " + abspath)
1895
return self._sha1_provider.sha1(abspath)
1897
def _is_executable(self, mode, old_executable):
1898
"""Is this file executable?"""
1899
return bool(S_IEXEC & mode)
1901
def _is_executable_win32(self, mode, old_executable):
1902
"""On win32 the executable bit is stored in the dirstate."""
1903
return old_executable
1905
if sys.platform == 'win32':
1906
_is_executable = _is_executable_win32
1908
def _read_link(self, abspath, old_link):
1909
"""Read the target of a symlink"""
1910
# TODO: jam 200700301 On Win32, this could just return the value
1911
# already in memory. However, this really needs to be done at a
1912
# higher level, because there either won't be anything on disk,
1913
# or the thing on disk will be a file.
1914
fs_encoding = osutils._fs_enc
1915
if isinstance(abspath, unicode):
1916
# abspath is defined as the path to pass to lstat. readlink is
1917
# buggy in python < 2.6 (it doesn't encode unicode path into FS
1918
# encoding), so we need to encode ourselves knowing that unicode
1919
# paths are produced by UnicodeDirReader on purpose.
1920
abspath = abspath.encode(fs_encoding)
1921
target = os.readlink(abspath)
1922
if fs_encoding not in ('UTF-8', 'US-ASCII', 'ANSI_X3.4-1968'):
1923
# Change encoding if needed
1924
target = target.decode(fs_encoding).encode('UTF-8')
1927
def get_ghosts(self):
1928
"""Return a list of the parent tree revision ids that are ghosts."""
1929
self._read_header_if_needed()
1932
def get_lines(self):
1933
"""Serialise the entire dirstate to a sequence of lines."""
1934
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1935
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1936
# read what's on disk.
1937
self._state_file.seek(0)
1938
return self._state_file.readlines()
1940
lines.append(self._get_parents_line(self.get_parent_ids()))
1941
lines.append(self._get_ghosts_line(self._ghosts))
1942
lines.extend(self._get_entry_lines())
1943
return self._get_output_lines(lines)
1945
def _get_ghosts_line(self, ghost_ids):
1946
"""Create a line for the state file for ghost information."""
1947
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1949
def _get_parents_line(self, parent_ids):
1950
"""Create a line for the state file for parents information."""
1951
return '\0'.join([str(len(parent_ids))] + parent_ids)
1953
def _get_entry_lines(self):
1954
"""Create lines for entries."""
1955
return map(self._entry_to_line, self._iter_entries())
1957
def _get_fields_to_entry(self):
1958
"""Get a function which converts entry fields into a entry record.
1960
This handles size and executable, as well as parent records.
1962
:return: A function which takes a list of fields, and returns an
1963
appropriate record for storing in memory.
1965
# This is intentionally unrolled for performance
1966
num_present_parents = self._num_present_parents()
1967
if num_present_parents == 0:
1968
def fields_to_entry_0_parents(fields, _int=int):
1969
path_name_file_id_key = (fields[0], fields[1], fields[2])
1970
return (path_name_file_id_key, [
1972
fields[3], # minikind
1973
fields[4], # fingerprint
1974
_int(fields[5]), # size
1975
fields[6] == 'y', # executable
1976
fields[7], # packed_stat or revision_id
1978
return fields_to_entry_0_parents
1979
elif num_present_parents == 1:
1980
def fields_to_entry_1_parent(fields, _int=int):
1981
path_name_file_id_key = (fields[0], fields[1], fields[2])
1982
return (path_name_file_id_key, [
1984
fields[3], # minikind
1985
fields[4], # fingerprint
1986
_int(fields[5]), # size
1987
fields[6] == 'y', # executable
1988
fields[7], # packed_stat or revision_id
1991
fields[8], # minikind
1992
fields[9], # fingerprint
1993
_int(fields[10]), # size
1994
fields[11] == 'y', # executable
1995
fields[12], # packed_stat or revision_id
1998
return fields_to_entry_1_parent
1999
elif num_present_parents == 2:
2000
def fields_to_entry_2_parents(fields, _int=int):
2001
path_name_file_id_key = (fields[0], fields[1], fields[2])
2002
return (path_name_file_id_key, [
2004
fields[3], # minikind
2005
fields[4], # fingerprint
2006
_int(fields[5]), # size
2007
fields[6] == 'y', # executable
2008
fields[7], # packed_stat or revision_id
2011
fields[8], # minikind
2012
fields[9], # fingerprint
2013
_int(fields[10]), # size
2014
fields[11] == 'y', # executable
2015
fields[12], # packed_stat or revision_id
2018
fields[13], # minikind
2019
fields[14], # fingerprint
2020
_int(fields[15]), # size
2021
fields[16] == 'y', # executable
2022
fields[17], # packed_stat or revision_id
2025
return fields_to_entry_2_parents
2027
def fields_to_entry_n_parents(fields, _int=int):
2028
path_name_file_id_key = (fields[0], fields[1], fields[2])
2029
trees = [(fields[cur], # minikind
2030
fields[cur+1], # fingerprint
2031
_int(fields[cur+2]), # size
2032
fields[cur+3] == 'y', # executable
2033
fields[cur+4], # stat or revision_id
2034
) for cur in xrange(3, len(fields)-1, 5)]
2035
return path_name_file_id_key, trees
2036
return fields_to_entry_n_parents
2038
def get_parent_ids(self):
2039
"""Return a list of the parent tree ids for the directory state."""
2040
self._read_header_if_needed()
2041
return list(self._parents)
2043
def _get_block_entry_index(self, dirname, basename, tree_index):
2044
"""Get the coordinates for a path in the state structure.
2046
:param dirname: The utf8 dirname to lookup.
2047
:param basename: The utf8 basename to lookup.
2048
:param tree_index: The index of the tree for which this lookup should
2050
:return: A tuple describing where the path is located, or should be
2051
inserted. The tuple contains four fields: the block index, the row
2052
index, the directory is present (boolean), the entire path is
2053
present (boolean). There is no guarantee that either
2054
coordinate is currently reachable unless the found field for it is
2055
True. For instance, a directory not present in the searched tree
2056
may be returned with a value one greater than the current highest
2057
block offset. The directory present field will always be True when
2058
the path present field is True. The directory present field does
2059
NOT indicate that the directory is present in the searched tree,
2060
rather it indicates that there are at least some files in some
2063
self._read_dirblocks_if_needed()
2064
key = dirname, basename, ''
2065
block_index, present = self._find_block_index_from_key(key)
2067
# no such directory - return the dir index and 0 for the row.
2068
return block_index, 0, False, False
2069
block = self._dirblocks[block_index][1] # access the entries only
2070
entry_index, present = self._find_entry_index(key, block)
2071
# linear search through entries at this path to find the one
2073
while entry_index < len(block) and block[entry_index][0][1] == basename:
2074
if block[entry_index][1][tree_index][0] not in 'ar':
2075
# neither absent or relocated
2076
return block_index, entry_index, True, True
2078
return block_index, entry_index, True, False
2080
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None,
2081
include_deleted=False):
2082
"""Get the dirstate entry for path in tree tree_index.
2084
If either file_id or path is supplied, it is used as the key to lookup.
2085
If both are supplied, the fastest lookup is used, and an error is
2086
raised if they do not both point at the same row.
2088
:param tree_index: The index of the tree we wish to locate this path
2089
in. If the path is present in that tree, the entry containing its
2090
details is returned, otherwise (None, None) is returned
2091
0 is the working tree, higher indexes are successive parent
2093
:param fileid_utf8: A utf8 file_id to look up.
2094
:param path_utf8: An utf8 path to be looked up.
2095
:param include_deleted: If True, and performing a lookup via
2096
fileid_utf8 rather than path_utf8, return an entry for deleted
2098
:return: The dirstate entry tuple for path, or (None, None)
2100
self._read_dirblocks_if_needed()
2101
if path_utf8 is not None:
2102
if type(path_utf8) is not str:
2103
raise errors.BzrError('path_utf8 is not a str: %s %r'
2104
% (type(path_utf8), path_utf8))
2105
# path lookups are faster
2106
dirname, basename = osutils.split(path_utf8)
2107
block_index, entry_index, dir_present, file_present = \
2108
self._get_block_entry_index(dirname, basename, tree_index)
2109
if not file_present:
2111
entry = self._dirblocks[block_index][1][entry_index]
2112
if not (entry[0][2] and entry[1][tree_index][0] not in ('a', 'r')):
2113
raise AssertionError('unversioned entry?')
2115
if entry[0][2] != fileid_utf8:
2116
self._changes_aborted = True
2117
raise errors.BzrError('integrity error ? : mismatching'
2118
' tree_index, file_id and path')
2121
possible_keys = self._get_id_index().get(fileid_utf8, ())
2122
if not possible_keys:
2124
for key in possible_keys:
2125
block_index, present = \
2126
self._find_block_index_from_key(key)
2127
# strange, probably indicates an out of date
2128
# id index - for now, allow this.
2131
# WARNING: DO not change this code to use _get_block_entry_index
2132
# as that function is not suitable: it does not use the key
2133
# to lookup, and thus the wrong coordinates are returned.
2134
block = self._dirblocks[block_index][1]
2135
entry_index, present = self._find_entry_index(key, block)
2137
entry = self._dirblocks[block_index][1][entry_index]
2138
# TODO: We might want to assert that entry[0][2] ==
2140
if entry[1][tree_index][0] in 'fdlt':
2141
# this is the result we are looking for: the
2142
# real home of this file_id in this tree.
2144
if entry[1][tree_index][0] == 'a':
2145
# there is no home for this entry in this tree
2149
if entry[1][tree_index][0] != 'r':
2150
raise AssertionError(
2151
"entry %r has invalid minikind %r for tree %r" \
2153
entry[1][tree_index][0],
2155
real_path = entry[1][tree_index][1]
2156
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
2157
path_utf8=real_path)
2161
def initialize(cls, path, sha1_provider=None):
2162
"""Create a new dirstate on path.
2164
The new dirstate will be an empty tree - that is it has no parents,
2165
and only a root node - which has id ROOT_ID.
2167
:param path: The name of the file for the dirstate.
2168
:param sha1_provider: an object meeting the SHA1Provider interface.
2169
If None, a DefaultSHA1Provider is used.
2170
:return: A write-locked DirState object.
2172
# This constructs a new DirState object on a path, sets the _state_file
2173
# to a new empty file for that path. It then calls _set_data() with our
2174
# stock empty dirstate information - a root with ROOT_ID, no children,
2175
# and no parents. Finally it calls save() to ensure that this data will
2177
if sha1_provider is None:
2178
sha1_provider = DefaultSHA1Provider()
2179
result = cls(path, sha1_provider)
2180
# root dir and root dir contents with no children.
2181
empty_tree_dirblocks = [('', []), ('', [])]
2182
# a new root directory, with a NULLSTAT.
2183
empty_tree_dirblocks[0][1].append(
2184
(('', '', inventory.ROOT_ID), [
2185
('d', '', 0, False, DirState.NULLSTAT),
2189
result._set_data([], empty_tree_dirblocks)
2197
def _inv_entry_to_details(inv_entry):
2198
"""Convert an inventory entry (from a revision tree) to state details.
2200
:param inv_entry: An inventory entry whose sha1 and link targets can be
2201
relied upon, and which has a revision set.
2202
:return: A details tuple - the details for a single tree at a path +
2205
kind = inv_entry.kind
2206
minikind = DirState._kind_to_minikind[kind]
2207
tree_data = inv_entry.revision
2208
if kind == 'directory':
2212
elif kind == 'symlink':
2213
if inv_entry.symlink_target is None:
2216
fingerprint = inv_entry.symlink_target.encode('utf8')
2219
elif kind == 'file':
2220
fingerprint = inv_entry.text_sha1 or ''
2221
size = inv_entry.text_size or 0
2222
executable = inv_entry.executable
2223
elif kind == 'tree-reference':
2224
fingerprint = inv_entry.reference_revision or ''
2228
raise Exception("can't pack %s" % inv_entry)
2229
return static_tuple.StaticTuple(minikind, fingerprint, size,
2230
executable, tree_data)
2232
def _iter_child_entries(self, tree_index, path_utf8):
2233
"""Iterate over all the entries that are children of path_utf.
2235
This only returns entries that are present (not in 'a', 'r') in
2236
tree_index. tree_index data is not refreshed, so if tree 0 is used,
2237
results may differ from that obtained if paths were statted to
2238
determine what ones were directories.
2240
Asking for the children of a non-directory will return an empty
2244
next_pending_dirs = [path_utf8]
2246
while next_pending_dirs:
2247
pending_dirs = next_pending_dirs
2248
next_pending_dirs = []
2249
for path in pending_dirs:
2250
block_index, present = self._find_block_index_from_key(
2252
if block_index == 0:
2254
if len(self._dirblocks) == 1:
2255
# asked for the children of the root with no other
2259
# children of a non-directory asked for.
2261
block = self._dirblocks[block_index]
2262
for entry in block[1]:
2263
kind = entry[1][tree_index][0]
2264
if kind not in absent:
2268
path = entry[0][0] + '/' + entry[0][1]
2271
next_pending_dirs.append(path)
2273
def _iter_entries(self):
2274
"""Iterate over all the entries in the dirstate.
2276
Each yelt item is an entry in the standard format described in the
2277
docstring of bzrlib.dirstate.
2279
self._read_dirblocks_if_needed()
2280
for directory in self._dirblocks:
2281
for entry in directory[1]:
2284
def _get_id_index(self):
2285
"""Get an id index of self._dirblocks.
2287
This maps from file_id => [(directory, name, file_id)] entries where
2288
that file_id appears in one of the trees.
2290
if self._id_index is None:
2292
for key, tree_details in self._iter_entries():
2293
self._add_to_id_index(id_index, key)
2294
self._id_index = id_index
2295
return self._id_index
2297
def _add_to_id_index(self, id_index, entry_key):
2298
"""Add this entry to the _id_index mapping."""
2299
# This code used to use a set for every entry in the id_index. However,
2300
# it is *rare* to have more than one entry. So a set is a large
2301
# overkill. And even when we do, we won't ever have more than the
2302
# number of parent trees. Which is still a small number (rarely >2). As
2303
# such, we use a simple tuple, and do our own uniqueness checks. While
2304
# the 'in' check is O(N) since N is nicely bounded it shouldn't ever
2305
# cause quadratic failure.
2306
file_id = entry_key[2]
2307
entry_key = static_tuple.StaticTuple.from_sequence(entry_key)
2308
if file_id not in id_index:
2309
id_index[file_id] = static_tuple.StaticTuple(entry_key,)
2311
entry_keys = id_index[file_id]
2312
if entry_key not in entry_keys:
2313
id_index[file_id] = entry_keys + (entry_key,)
2315
def _remove_from_id_index(self, id_index, entry_key):
2316
"""Remove this entry from the _id_index mapping.
2318
It is an programming error to call this when the entry_key is not
2321
file_id = entry_key[2]
2322
entry_keys = list(id_index[file_id])
2323
entry_keys.remove(entry_key)
2324
id_index[file_id] = static_tuple.StaticTuple.from_sequence(entry_keys)
2326
def _get_output_lines(self, lines):
2327
"""Format lines for final output.
2329
:param lines: A sequence of lines containing the parents list and the
2332
output_lines = [DirState.HEADER_FORMAT_3]
2333
lines.append('') # a final newline
2334
inventory_text = '\0\n\0'.join(lines)
2335
output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
2336
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
2337
num_entries = len(lines)-3
2338
output_lines.append('num_entries: %s\n' % (num_entries,))
2339
output_lines.append(inventory_text)
2342
def _make_deleted_row(self, fileid_utf8, parents):
2343
"""Return a deleted row for fileid_utf8."""
2344
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
2347
def _num_present_parents(self):
2348
"""The number of parent entries in each record row."""
2349
return len(self._parents) - len(self._ghosts)
2352
def on_file(cls, path, sha1_provider=None, worth_saving_limit=0):
2353
"""Construct a DirState on the file at path "path".
2355
:param path: The path at which the dirstate file on disk should live.
2356
:param sha1_provider: an object meeting the SHA1Provider interface.
2357
If None, a DefaultSHA1Provider is used.
2358
:param worth_saving_limit: when the exact number of hash changed
2359
entries is known, only bother saving the dirstate if more than
2360
this count of entries have changed. -1 means never save.
2361
:return: An unlocked DirState object, associated with the given path.
2363
if sha1_provider is None:
2364
sha1_provider = DefaultSHA1Provider()
2365
result = cls(path, sha1_provider,
2366
worth_saving_limit=worth_saving_limit)
2369
def _read_dirblocks_if_needed(self):
2370
"""Read in all the dirblocks from the file if they are not in memory.
2372
This populates self._dirblocks, and sets self._dirblock_state to
2373
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
2376
self._read_header_if_needed()
2377
if self._dirblock_state == DirState.NOT_IN_MEMORY:
2378
_read_dirblocks(self)
2380
def _read_header(self):
2381
"""This reads in the metadata header, and the parent ids.
2383
After reading in, the file should be positioned at the null
2384
just before the start of the first record in the file.
2386
:return: (expected crc checksum, number of entries, parent list)
2388
self._read_prelude()
2389
parent_line = self._state_file.readline()
2390
info = parent_line.split('\0')
2391
num_parents = int(info[0])
2392
self._parents = info[1:-1]
2393
ghost_line = self._state_file.readline()
2394
info = ghost_line.split('\0')
2395
num_ghosts = int(info[1])
2396
self._ghosts = info[2:-1]
2397
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2398
self._end_of_header = self._state_file.tell()
2400
def _read_header_if_needed(self):
2401
"""Read the header of the dirstate file if needed."""
2402
# inline this as it will be called a lot
2403
if not self._lock_token:
2404
raise errors.ObjectNotLocked(self)
2405
if self._header_state == DirState.NOT_IN_MEMORY:
2408
def _read_prelude(self):
2409
"""Read in the prelude header of the dirstate file.
2411
This only reads in the stuff that is not connected to the crc
2412
checksum. The position will be correct to read in the rest of
2413
the file and check the checksum after this point.
2414
The next entry in the file should be the number of parents,
2415
and their ids. Followed by a newline.
2417
header = self._state_file.readline()
2418
if header != DirState.HEADER_FORMAT_3:
2419
raise errors.BzrError(
2420
'invalid header line: %r' % (header,))
2421
crc_line = self._state_file.readline()
2422
if not crc_line.startswith('crc32: '):
2423
raise errors.BzrError('missing crc32 checksum: %r' % crc_line)
2424
self.crc_expected = int(crc_line[len('crc32: '):-1])
2425
num_entries_line = self._state_file.readline()
2426
if not num_entries_line.startswith('num_entries: '):
2427
raise errors.BzrError('missing num_entries line')
2428
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
2430
def sha1_from_stat(self, path, stat_result):
2431
"""Find a sha1 given a stat lookup."""
2432
return self._get_packed_stat_index().get(pack_stat(stat_result), None)
2434
def _get_packed_stat_index(self):
2435
"""Get a packed_stat index of self._dirblocks."""
2436
if self._packed_stat_index is None:
2438
for key, tree_details in self._iter_entries():
2439
if tree_details[0][0] == 'f':
2440
index[tree_details[0][4]] = tree_details[0][1]
2441
self._packed_stat_index = index
2442
return self._packed_stat_index
2445
"""Save any pending changes created during this session.
2447
We reuse the existing file, because that prevents race conditions with
2448
file creation, and use oslocks on it to prevent concurrent modification
2449
and reads - because dirstate's incremental data aggregation is not
2450
compatible with reading a modified file, and replacing a file in use by
2451
another process is impossible on Windows.
2453
A dirstate in read only mode should be smart enough though to validate
2454
that the file has not changed, and otherwise discard its cache and
2455
start over, to allow for fine grained read lock duration, so 'status'
2456
wont block 'commit' - for example.
2458
if self._changes_aborted:
2459
# Should this be a warning? For now, I'm expecting that places that
2460
# mark it inconsistent will warn, making a warning here redundant.
2461
trace.mutter('Not saving DirState because '
2462
'_changes_aborted is set.')
2464
# TODO: Since we now distinguish IN_MEMORY_MODIFIED from
2465
# IN_MEMORY_HASH_MODIFIED, we should only fail quietly if we fail
2466
# to save an IN_MEMORY_HASH_MODIFIED, and fail *noisily* if we
2467
# fail to save IN_MEMORY_MODIFIED
2468
if not self._worth_saving():
2471
grabbed_write_lock = False
2472
if self._lock_state != 'w':
2473
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2474
# Switch over to the new lock, as the old one may be closed.
2475
# TODO: jam 20070315 We should validate the disk file has
2476
# not changed contents, since temporary_write_lock may
2477
# not be an atomic operation.
2478
self._lock_token = new_lock
2479
self._state_file = new_lock.f
2480
if not grabbed_write_lock:
2481
# We couldn't grab a write lock, so we switch back to a read one
2484
lines = self.get_lines()
2485
self._state_file.seek(0)
2486
self._state_file.writelines(lines)
2487
self._state_file.truncate()
2488
self._state_file.flush()
2489
self._maybe_fdatasync()
2490
self._mark_unmodified()
2492
if grabbed_write_lock:
2493
self._lock_token = self._lock_token.restore_read_lock()
2494
self._state_file = self._lock_token.f
2495
# TODO: jam 20070315 We should validate the disk file has
2496
# not changed contents. Since restore_read_lock may
2497
# not be an atomic operation.
2499
def _maybe_fdatasync(self):
2500
"""Flush to disk if possible and if not configured off."""
2501
if self._config_stack.get('dirstate.fdatasync'):
2502
osutils.fdatasync(self._state_file.fileno())
2504
def _worth_saving(self):
2505
"""Is it worth saving the dirstate or not?"""
2506
if (self._header_state == DirState.IN_MEMORY_MODIFIED
2507
or self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2509
if self._dirblock_state == DirState.IN_MEMORY_HASH_MODIFIED:
2510
if self._worth_saving_limit == -1:
2511
# We never save hash changes when the limit is -1
2513
# If we're using smart saving and only a small number of
2514
# entries have changed their hash, don't bother saving. John has
2515
# suggested using a heuristic here based on the size of the
2516
# changed files and/or tree. For now, we go with a configurable
2517
# number of changes, keeping the calculation time
2518
# as low overhead as possible. (This also keeps all existing
2519
# tests passing as the default is 0, i.e. always save.)
2520
if len(self._known_hash_changes) >= self._worth_saving_limit:
2524
def _set_data(self, parent_ids, dirblocks):
2525
"""Set the full dirstate data in memory.
2527
This is an internal function used to completely replace the objects
2528
in memory state. It puts the dirstate into state 'full-dirty'.
2530
:param parent_ids: A list of parent tree revision ids.
2531
:param dirblocks: A list containing one tuple for each directory in the
2532
tree. Each tuple contains the directory path and a list of entries
2533
found in that directory.
2535
# our memory copy is now authoritative.
2536
self._dirblocks = dirblocks
2537
self._mark_modified(header_modified=True)
2538
self._parents = list(parent_ids)
2539
self._id_index = None
2540
self._packed_stat_index = None
2542
def set_path_id(self, path, new_id):
2543
"""Change the id of path to new_id in the current working tree.
2545
:param path: The path inside the tree to set - '' is the root, 'foo'
2546
is the path foo in the root.
2547
:param new_id: The new id to assign to the path. This must be a utf8
2548
file id (not unicode, and not None).
2550
self._read_dirblocks_if_needed()
2552
# TODO: logic not written
2553
raise NotImplementedError(self.set_path_id)
2554
# TODO: check new id is unique
2555
entry = self._get_entry(0, path_utf8=path)
2556
if entry[0][2] == new_id:
2557
# Nothing to change.
2559
# mark the old path absent, and insert a new root path
2560
self._make_absent(entry)
2561
self.update_minimal(('', '', new_id), 'd',
2562
path_utf8='', packed_stat=entry[1][0][4])
2563
self._mark_modified()
2564
# XXX: This was added by Ian, we need to make sure there
2565
# are tests for it, because it isn't in bzr.dev TRUNK
2566
# It looks like the only place it is called is in setting the root
2567
# id of the tree. So probably we never had an _id_index when we
2568
# don't even have a root yet.
2569
if self._id_index is not None:
2570
self._add_to_id_index(self._id_index, entry[0])
2572
def set_parent_trees(self, trees, ghosts):
2573
"""Set the parent trees for the dirstate.
2575
:param trees: A list of revision_id, tree tuples. tree must be provided
2576
even if the revision_id refers to a ghost: supply an empty tree in
2578
:param ghosts: A list of the revision_ids that are ghosts at the time
2581
# TODO: generate a list of parent indexes to preserve to save
2582
# processing specific parent trees. In the common case one tree will
2583
# be preserved - the left most parent.
2584
# TODO: if the parent tree is a dirstate, we might want to walk them
2585
# all by path in parallel for 'optimal' common-case performance.
2586
# generate new root row.
2587
self._read_dirblocks_if_needed()
2588
# TODO future sketch: Examine the existing parents to generate a change
2589
# map and then walk the new parent trees only, mapping them into the
2590
# dirstate. Walk the dirstate at the same time to remove unreferenced
2593
# sketch: loop over all entries in the dirstate, cherry picking
2594
# entries from the parent trees, if they are not ghost trees.
2595
# after we finish walking the dirstate, all entries not in the dirstate
2596
# are deletes, so we want to append them to the end as per the design
2597
# discussions. So do a set difference on ids with the parents to
2598
# get deletes, and add them to the end.
2599
# During the update process we need to answer the following questions:
2600
# - find other keys containing a fileid in order to create cross-path
2601
# links. We dont't trivially use the inventory from other trees
2602
# because this leads to either double touching, or to accessing
2604
# - find other keys containing a path
2605
# We accumulate each entry via this dictionary, including the root
2608
# we could do parallel iterators, but because file id data may be
2609
# scattered throughout, we dont save on index overhead: we have to look
2610
# at everything anyway. We can probably save cycles by reusing parent
2611
# data and doing an incremental update when adding an additional
2612
# parent, but for now the common cases are adding a new parent (merge),
2613
# and replacing completely (commit), and commit is more common: so
2614
# optimise merge later.
2616
# ---- start generation of full tree mapping data
2617
# what trees should we use?
2618
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
2619
# how many trees do we end up with
2620
parent_count = len(parent_trees)
2621
st = static_tuple.StaticTuple
2623
# one: the current tree
2624
for entry in self._iter_entries():
2625
# skip entries not in the current tree
2626
if entry[1][0][0] in 'ar': # absent, relocated
2628
by_path[entry[0]] = [entry[1][0]] + \
2629
[DirState.NULL_PARENT_DETAILS] * parent_count
2630
# TODO: Possibly inline this, since we know it isn't present yet
2631
# id_index[entry[0][2]] = (entry[0],)
2632
self._add_to_id_index(id_index, entry[0])
2634
# now the parent trees:
2635
for tree_index, tree in enumerate(parent_trees):
2636
# the index is off by one, adjust it.
2637
tree_index = tree_index + 1
2638
# when we add new locations for a fileid we need these ranges for
2639
# any fileid in this tree as we set the by_path[id] to:
2640
# already_processed_tree_details + new_details + new_location_suffix
2641
# the suffix is from tree_index+1:parent_count+1.
2642
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
2643
# now stitch in all the entries from this tree
2645
for path, entry in tree.iter_entries_by_dir():
2646
# here we process each trees details for each item in the tree.
2647
# we first update any existing entries for the id at other paths,
2648
# then we either create or update the entry for the id at the
2649
# right path, and finally we add (if needed) a mapping from
2650
# file_id to this path. We do it in this order to allow us to
2651
# avoid checking all known paths for the id when generating a
2652
# new entry at this path: by adding the id->path mapping last,
2653
# all the mappings are valid and have correct relocation
2654
# records where needed.
2655
file_id = entry.file_id
2656
path_utf8 = path.encode('utf8')
2657
dirname, basename = osutils.split(path_utf8)
2658
if dirname == last_dirname:
2659
# Try to re-use objects as much as possible
2660
dirname = last_dirname
2662
last_dirname = dirname
2663
new_entry_key = st(dirname, basename, file_id)
2664
# tree index consistency: All other paths for this id in this tree
2665
# index must point to the correct path.
2666
entry_keys = id_index.get(file_id, ())
2667
for entry_key in entry_keys:
2668
# TODO:PROFILING: It might be faster to just update
2669
# rather than checking if we need to, and then overwrite
2670
# the one we are located at.
2671
if entry_key != new_entry_key:
2672
# this file id is at a different path in one of the
2673
# other trees, so put absent pointers there
2674
# This is the vertical axis in the matrix, all pointing
2676
by_path[entry_key][tree_index] = st('r', path_utf8, 0,
2678
# by path consistency: Insert into an existing path record
2679
# (trivial), or add a new one with relocation pointers for the
2680
# other tree indexes.
2681
if new_entry_key in entry_keys:
2682
# there is already an entry where this data belongs, just
2684
by_path[new_entry_key][tree_index] = \
2685
self._inv_entry_to_details(entry)
2687
# add relocated entries to the horizontal axis - this row
2688
# mapping from path,id. We need to look up the correct path
2689
# for the indexes from 0 to tree_index -1
2691
for lookup_index in xrange(tree_index):
2692
# boundary case: this is the first occurence of file_id
2693
# so there are no id_indexes, possibly take this out of
2695
if not len(entry_keys):
2696
new_details.append(DirState.NULL_PARENT_DETAILS)
2698
# grab any one entry, use it to find the right path.
2699
a_key = iter(entry_keys).next()
2700
if by_path[a_key][lookup_index][0] in ('r', 'a'):
2701
# its a pointer or missing statement, use it as
2703
new_details.append(by_path[a_key][lookup_index])
2705
# we have the right key, make a pointer to it.
2706
real_path = ('/'.join(a_key[0:2])).strip('/')
2707
new_details.append(st('r', real_path, 0, False,
2709
new_details.append(self._inv_entry_to_details(entry))
2710
new_details.extend(new_location_suffix)
2711
by_path[new_entry_key] = new_details
2712
self._add_to_id_index(id_index, new_entry_key)
2713
# --- end generation of full tree mappings
2715
# sort and output all the entries
2716
new_entries = self._sort_entries(by_path.items())
2717
self._entries_to_current_state(new_entries)
2718
self._parents = [rev_id for rev_id, tree in trees]
2719
self._ghosts = list(ghosts)
2720
self._mark_modified(header_modified=True)
2721
self._id_index = id_index
2723
def _sort_entries(self, entry_list):
2724
"""Given a list of entries, sort them into the right order.
2726
This is done when constructing a new dirstate from trees - normally we
2727
try to keep everything in sorted blocks all the time, but sometimes
2728
it's easier to sort after the fact.
2730
# When sorting, we usually have 10x more entries than directories. (69k
2731
# total entries, 4k directories). So cache the results of splitting.
2732
# Saving time and objects. Also, use StaticTuple to avoid putting all
2733
# of these object into python's garbage collector.
2735
def _key(entry, _split_dirs=split_dirs, _st=static_tuple.StaticTuple):
2736
# sort by: directory parts, file name, file id
2737
dirpath, fname, file_id = entry[0]
2739
split = _split_dirs[dirpath]
2741
split = _st.from_sequence(dirpath.split('/'))
2742
_split_dirs[dirpath] = split
2743
return _st(split, fname, file_id)
2744
return sorted(entry_list, key=_key)
2746
def set_state_from_inventory(self, new_inv):
2747
"""Set new_inv as the current state.
2749
This API is called by tree transform, and will usually occur with
2750
existing parent trees.
2752
:param new_inv: The inventory object to set current state from.
2754
if 'evil' in debug.debug_flags:
2755
trace.mutter_callsite(1,
2756
"set_state_from_inventory called; please mutate the tree instead")
2757
tracing = 'dirstate' in debug.debug_flags
2759
trace.mutter("set_state_from_inventory trace:")
2760
self._read_dirblocks_if_needed()
2762
# Two iterators: current data and new data, both in dirblock order.
2763
# We zip them together, which tells about entries that are new in the
2764
# inventory, or removed in the inventory, or present in both and
2767
# You might think we could just synthesize a new dirstate directly
2768
# since we're processing it in the right order. However, we need to
2769
# also consider there may be any number of parent trees and relocation
2770
# pointers, and we don't want to duplicate that here.
2771
new_iterator = new_inv.iter_entries_by_dir()
2772
# we will be modifying the dirstate, so we need a stable iterator. In
2773
# future we might write one, for now we just clone the state into a
2774
# list using a copy so that we see every original item and don't have
2775
# to adjust the position when items are inserted or deleted in the
2776
# underlying dirstate.
2777
old_iterator = iter(list(self._iter_entries()))
2778
# both must have roots so this is safe:
2779
current_new = new_iterator.next()
2780
current_old = old_iterator.next()
2781
def advance(iterator):
2783
return iterator.next()
2784
except StopIteration:
2786
while current_new or current_old:
2787
# skip entries in old that are not really there
2788
if current_old and current_old[1][0][0] in 'ar':
2789
# relocated or absent
2790
current_old = advance(old_iterator)
2793
# convert new into dirblock style
2794
new_path_utf8 = current_new[0].encode('utf8')
2795
new_dirname, new_basename = osutils.split(new_path_utf8)
2796
new_id = current_new[1].file_id
2797
new_entry_key = (new_dirname, new_basename, new_id)
2798
current_new_minikind = \
2799
DirState._kind_to_minikind[current_new[1].kind]
2800
if current_new_minikind == 't':
2801
fingerprint = current_new[1].reference_revision or ''
2803
# We normally only insert or remove records, or update
2804
# them when it has significantly changed. Then we want to
2805
# erase its fingerprint. Unaffected records should
2806
# normally not be updated at all.
2809
# for safety disable variables
2810
new_path_utf8 = new_dirname = new_basename = new_id = \
2811
new_entry_key = None
2812
# 5 cases, we dont have a value that is strictly greater than everything, so
2813
# we make both end conditions explicit
2815
# old is finished: insert current_new into the state.
2817
trace.mutter("Appending from new '%s'.",
2818
new_path_utf8.decode('utf8'))
2819
self.update_minimal(new_entry_key, current_new_minikind,
2820
executable=current_new[1].executable,
2821
path_utf8=new_path_utf8, fingerprint=fingerprint,
2823
current_new = advance(new_iterator)
2824
elif not current_new:
2827
trace.mutter("Truncating from old '%s/%s'.",
2828
current_old[0][0].decode('utf8'),
2829
current_old[0][1].decode('utf8'))
2830
self._make_absent(current_old)
2831
current_old = advance(old_iterator)
2832
elif new_entry_key == current_old[0]:
2833
# same - common case
2834
# We're looking at the same path and id in both the dirstate
2835
# and inventory, so just need to update the fields in the
2836
# dirstate from the one in the inventory.
2837
# TODO: update the record if anything significant has changed.
2838
# the minimal required trigger is if the execute bit or cached
2840
if (current_old[1][0][3] != current_new[1].executable or
2841
current_old[1][0][0] != current_new_minikind):
2843
trace.mutter("Updating in-place change '%s'.",
2844
new_path_utf8.decode('utf8'))
2845
self.update_minimal(current_old[0], current_new_minikind,
2846
executable=current_new[1].executable,
2847
path_utf8=new_path_utf8, fingerprint=fingerprint,
2849
# both sides are dealt with, move on
2850
current_old = advance(old_iterator)
2851
current_new = advance(new_iterator)
2852
elif (cmp_by_dirs(new_dirname, current_old[0][0]) < 0
2853
or (new_dirname == current_old[0][0]
2854
and new_entry_key[1:] < current_old[0][1:])):
2856
# add a entry for this and advance new
2858
trace.mutter("Inserting from new '%s'.",
2859
new_path_utf8.decode('utf8'))
2860
self.update_minimal(new_entry_key, current_new_minikind,
2861
executable=current_new[1].executable,
2862
path_utf8=new_path_utf8, fingerprint=fingerprint,
2864
current_new = advance(new_iterator)
2866
# we've advanced past the place where the old key would be,
2867
# without seeing it in the new list. so it must be gone.
2869
trace.mutter("Deleting from old '%s/%s'.",
2870
current_old[0][0].decode('utf8'),
2871
current_old[0][1].decode('utf8'))
2872
self._make_absent(current_old)
2873
current_old = advance(old_iterator)
2874
self._mark_modified()
2875
self._id_index = None
2876
self._packed_stat_index = None
2878
trace.mutter("set_state_from_inventory complete.")
2880
def set_state_from_scratch(self, working_inv, parent_trees, parent_ghosts):
2881
"""Wipe the currently stored state and set it to something new.
2883
This is a hard-reset for the data we are working with.
2885
# Technically, we really want a write lock, but until we write, we
2886
# don't really need it.
2887
self._requires_lock()
2888
# root dir and root dir contents with no children. We have to have a
2889
# root for set_state_from_inventory to work correctly.
2890
empty_root = (('', '', inventory.ROOT_ID),
2891
[('d', '', 0, False, DirState.NULLSTAT)])
2892
empty_tree_dirblocks = [('', [empty_root]), ('', [])]
2893
self._set_data([], empty_tree_dirblocks)
2894
self.set_state_from_inventory(working_inv)
2895
self.set_parent_trees(parent_trees, parent_ghosts)
2897
def _make_absent(self, current_old):
2898
"""Mark current_old - an entry - as absent for tree 0.
2900
:return: True if this was the last details entry for the entry key:
2901
that is, if the underlying block has had the entry removed, thus
2902
shrinking in length.
2904
# build up paths that this id will be left at after the change is made,
2905
# so we can update their cross references in tree 0
2906
all_remaining_keys = set()
2907
# Dont check the working tree, because it's going.
2908
for details in current_old[1][1:]:
2909
if details[0] not in 'ar': # absent, relocated
2910
all_remaining_keys.add(current_old[0])
2911
elif details[0] == 'r': # relocated
2912
# record the key for the real path.
2913
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2914
# absent rows are not present at any path.
2915
last_reference = current_old[0] not in all_remaining_keys
2917
# the current row consists entire of the current item (being marked
2918
# absent), and relocated or absent entries for the other trees:
2919
# Remove it, its meaningless.
2920
block = self._find_block(current_old[0])
2921
entry_index, present = self._find_entry_index(current_old[0], block[1])
2923
raise AssertionError('could not find entry for %s' % (current_old,))
2924
block[1].pop(entry_index)
2925
# if we have an id_index in use, remove this key from it for this id.
2926
if self._id_index is not None:
2927
self._remove_from_id_index(self._id_index, current_old[0])
2928
# update all remaining keys for this id to record it as absent. The
2929
# existing details may either be the record we are marking as deleted
2930
# (if there were other trees with the id present at this path), or may
2932
for update_key in all_remaining_keys:
2933
update_block_index, present = \
2934
self._find_block_index_from_key(update_key)
2936
raise AssertionError('could not find block for %s' % (update_key,))
2937
update_entry_index, present = \
2938
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2940
raise AssertionError('could not find entry for %s' % (update_key,))
2941
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2942
# it must not be absent at the moment
2943
if update_tree_details[0][0] == 'a': # absent
2944
raise AssertionError('bad row %r' % (update_tree_details,))
2945
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2946
self._mark_modified()
2947
return last_reference
2949
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2950
packed_stat=None, size=0, path_utf8=None, fullscan=False):
2951
"""Update an entry to the state in tree 0.
2953
This will either create a new entry at 'key' or update an existing one.
2954
It also makes sure that any other records which might mention this are
2957
:param key: (dir, name, file_id) for the new entry
2958
:param minikind: The type for the entry ('f' == 'file', 'd' ==
2960
:param executable: Should the executable bit be set?
2961
:param fingerprint: Simple fingerprint for new entry: canonical-form
2962
sha1 for files, referenced revision id for subtrees, etc.
2963
:param packed_stat: Packed stat value for new entry.
2964
:param size: Size information for new entry
2965
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
2967
:param fullscan: If True then a complete scan of the dirstate is being
2968
done and checking for duplicate rows should not be done. This
2969
should only be set by set_state_from_inventory and similar methods.
2971
If packed_stat and fingerprint are not given, they're invalidated in
2974
block = self._find_block(key)[1]
2975
if packed_stat is None:
2976
packed_stat = DirState.NULLSTAT
2977
# XXX: Some callers pass '' as the packed_stat, and it seems to be
2978
# sometimes present in the dirstate - this seems oddly inconsistent.
2980
entry_index, present = self._find_entry_index(key, block)
2981
new_details = (minikind, fingerprint, size, executable, packed_stat)
2982
id_index = self._get_id_index()
2984
# New record. Check there isn't a entry at this path already.
2986
low_index, _ = self._find_entry_index(key[0:2] + ('',), block)
2987
while low_index < len(block):
2988
entry = block[low_index]
2989
if entry[0][0:2] == key[0:2]:
2990
if entry[1][0][0] not in 'ar':
2991
# This entry has the same path (but a different id) as
2992
# the new entry we're adding, and is present in ths
2994
self._raise_invalid(
2995
("%s/%s" % key[0:2]).decode('utf8'), key[2],
2996
"Attempt to add item at path already occupied by "
2997
"id %r" % entry[0][2])
3001
# new entry, synthesis cross reference here,
3002
existing_keys = id_index.get(key[2], ())
3003
if not existing_keys:
3004
# not currently in the state, simplest case
3005
new_entry = key, [new_details] + self._empty_parent_info()
3007
# present at one or more existing other paths.
3008
# grab one of them and use it to generate parent
3009
# relocation/absent entries.
3010
new_entry = key, [new_details]
3011
# existing_keys can be changed as we iterate.
3012
for other_key in tuple(existing_keys):
3013
# change the record at other to be a pointer to this new
3014
# record. The loop looks similar to the change to
3015
# relocations when updating an existing record but its not:
3016
# the test for existing kinds is different: this can be
3017
# factored out to a helper though.
3018
other_block_index, present = self._find_block_index_from_key(
3021
raise AssertionError('could not find block for %s' % (
3023
other_block = self._dirblocks[other_block_index][1]
3024
other_entry_index, present = self._find_entry_index(
3025
other_key, other_block)
3027
raise AssertionError(
3028
'update_minimal: could not find other entry for %s'
3030
if path_utf8 is None:
3031
raise AssertionError('no path')
3032
# Turn this other location into a reference to the new
3033
# location. This also updates the aliased iterator
3034
# (current_old in set_state_from_inventory) so that the old
3035
# entry, if not already examined, is skipped over by that
3037
other_entry = other_block[other_entry_index]
3038
other_entry[1][0] = ('r', path_utf8, 0, False, '')
3039
if self._maybe_remove_row(other_block, other_entry_index,
3041
# If the row holding this was removed, we need to
3042
# recompute where this entry goes
3043
entry_index, _ = self._find_entry_index(key, block)
3046
# adds a tuple to the new details for each column
3047
# - either by copying an existing relocation pointer inside that column
3048
# - or by creating a new pointer to the right row inside that column
3049
num_present_parents = self._num_present_parents()
3050
if num_present_parents:
3051
# TODO: This re-evaluates the existing_keys set, do we need
3052
# to do that ourselves?
3053
other_key = list(existing_keys)[0]
3054
for lookup_index in xrange(1, num_present_parents + 1):
3055
# grab any one entry, use it to find the right path.
3056
# TODO: optimise this to reduce memory use in highly
3057
# fragmented situations by reusing the relocation
3059
update_block_index, present = \
3060
self._find_block_index_from_key(other_key)
3062
raise AssertionError('could not find block for %s' % (other_key,))
3063
update_entry_index, present = \
3064
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
3066
raise AssertionError('update_minimal: could not find entry for %s' % (other_key,))
3067
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
3068
if update_details[0] in 'ar': # relocated, absent
3069
# its a pointer or absent in lookup_index's tree, use
3071
new_entry[1].append(update_details)
3073
# we have the right key, make a pointer to it.
3074
pointer_path = osutils.pathjoin(*other_key[0:2])
3075
new_entry[1].append(('r', pointer_path, 0, False, ''))
3076
block.insert(entry_index, new_entry)
3077
self._add_to_id_index(id_index, key)
3079
# Does the new state matter?
3080
block[entry_index][1][0] = new_details
3081
# parents cannot be affected by what we do.
3082
# other occurences of this id can be found
3083
# from the id index.
3085
# tree index consistency: All other paths for this id in this tree
3086
# index must point to the correct path. We have to loop here because
3087
# we may have passed entries in the state with this file id already
3088
# that were absent - where parent entries are - and they need to be
3089
# converted to relocated.
3090
if path_utf8 is None:
3091
raise AssertionError('no path')
3092
existing_keys = id_index.get(key[2], ())
3093
if key not in existing_keys:
3094
raise AssertionError('We found the entry in the blocks, but'
3095
' the key is not in the id_index.'
3096
' key: %s, existing_keys: %s' % (key, existing_keys))
3097
for entry_key in existing_keys:
3098
# TODO:PROFILING: It might be faster to just update
3099
# rather than checking if we need to, and then overwrite
3100
# the one we are located at.
3101
if entry_key != key:
3102
# this file id is at a different path in one of the
3103
# other trees, so put absent pointers there
3104
# This is the vertical axis in the matrix, all pointing
3106
block_index, present = self._find_block_index_from_key(entry_key)
3108
raise AssertionError('not present: %r', entry_key)
3109
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
3111
raise AssertionError('not present: %r', entry_key)
3112
self._dirblocks[block_index][1][entry_index][1][0] = \
3113
('r', path_utf8, 0, False, '')
3114
# add a containing dirblock if needed.
3115
if new_details[0] == 'd':
3116
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
3117
block_index, present = self._find_block_index_from_key(subdir_key)
3119
self._dirblocks.insert(block_index, (subdir_key[0], []))
3121
self._mark_modified()
3123
def _maybe_remove_row(self, block, index, id_index):
3124
"""Remove index if it is absent or relocated across the row.
3126
id_index is updated accordingly.
3127
:return: True if we removed the row, False otherwise
3129
present_in_row = False
3130
entry = block[index]
3131
for column in entry[1]:
3132
if column[0] not in 'ar':
3133
present_in_row = True
3135
if not present_in_row:
3137
self._remove_from_id_index(id_index, entry[0])
3141
def _validate(self):
3142
"""Check that invariants on the dirblock are correct.
3144
This can be useful in debugging; it shouldn't be necessary in
3147
This must be called with a lock held.
3149
# NOTE: This must always raise AssertionError not just assert,
3150
# otherwise it may not behave properly under python -O
3152
# TODO: All entries must have some content that's not 'a' or 'r',
3153
# otherwise it could just be removed.
3155
# TODO: All relocations must point directly to a real entry.
3157
# TODO: No repeated keys.
3160
from pprint import pformat
3161
self._read_dirblocks_if_needed()
3162
if len(self._dirblocks) > 0:
3163
if not self._dirblocks[0][0] == '':
3164
raise AssertionError(
3165
"dirblocks don't start with root block:\n" + \
3166
pformat(self._dirblocks))
3167
if len(self._dirblocks) > 1:
3168
if not self._dirblocks[1][0] == '':
3169
raise AssertionError(
3170
"dirblocks missing root directory:\n" + \
3171
pformat(self._dirblocks))
3172
# the dirblocks are sorted by their path components, name, and dir id
3173
dir_names = [d[0].split('/')
3174
for d in self._dirblocks[1:]]
3175
if dir_names != sorted(dir_names):
3176
raise AssertionError(
3177
"dir names are not in sorted order:\n" + \
3178
pformat(self._dirblocks) + \
3181
for dirblock in self._dirblocks:
3182
# within each dirblock, the entries are sorted by filename and
3184
for entry in dirblock[1]:
3185
if dirblock[0] != entry[0][0]:
3186
raise AssertionError(
3188
"doesn't match directory name in\n%r" %
3189
(entry, pformat(dirblock)))
3190
if dirblock[1] != sorted(dirblock[1]):
3191
raise AssertionError(
3192
"dirblock for %r is not sorted:\n%s" % \
3193
(dirblock[0], pformat(dirblock)))
3195
def check_valid_parent():
3196
"""Check that the current entry has a valid parent.
3198
This makes sure that the parent has a record,
3199
and that the parent isn't marked as "absent" in the
3200
current tree. (It is invalid to have a non-absent file in an absent
3203
if entry[0][0:2] == ('', ''):
3204
# There should be no parent for the root row
3206
parent_entry = self._get_entry(tree_index, path_utf8=entry[0][0])
3207
if parent_entry == (None, None):
3208
raise AssertionError(
3209
"no parent entry for: %s in tree %s"
3210
% (this_path, tree_index))
3211
if parent_entry[1][tree_index][0] != 'd':
3212
raise AssertionError(
3213
"Parent entry for %s is not marked as a valid"
3214
" directory. %s" % (this_path, parent_entry,))
3216
# For each file id, for each tree: either
3217
# the file id is not present at all; all rows with that id in the
3218
# key have it marked as 'absent'
3219
# OR the file id is present under exactly one name; any other entries
3220
# that mention that id point to the correct name.
3222
# We check this with a dict per tree pointing either to the present
3223
# name, or None if absent.
3224
tree_count = self._num_present_parents() + 1
3225
id_path_maps = [dict() for i in range(tree_count)]
3226
# Make sure that all renamed entries point to the correct location.
3227
for entry in self._iter_entries():
3228
file_id = entry[0][2]
3229
this_path = osutils.pathjoin(entry[0][0], entry[0][1])
3230
if len(entry[1]) != tree_count:
3231
raise AssertionError(
3232
"wrong number of entry details for row\n%s" \
3233
",\nexpected %d" % \
3234
(pformat(entry), tree_count))
3235
absent_positions = 0
3236
for tree_index, tree_state in enumerate(entry[1]):
3237
this_tree_map = id_path_maps[tree_index]
3238
minikind = tree_state[0]
3239
if minikind in 'ar':
3240
absent_positions += 1
3241
# have we seen this id before in this column?
3242
if file_id in this_tree_map:
3243
previous_path, previous_loc = this_tree_map[file_id]
3244
# any later mention of this file must be consistent with
3245
# what was said before
3247
if previous_path is not None:
3248
raise AssertionError(
3249
"file %s is absent in row %r but also present " \
3251
(file_id, entry, previous_path))
3252
elif minikind == 'r':
3253
target_location = tree_state[1]
3254
if previous_path != target_location:
3255
raise AssertionError(
3256
"file %s relocation in row %r but also at %r" \
3257
% (file_id, entry, previous_path))
3259
# a file, directory, etc - may have been previously
3260
# pointed to by a relocation, which must point here
3261
if previous_path != this_path:
3262
raise AssertionError(
3263
"entry %r inconsistent with previous path %r "
3265
(entry, previous_path, previous_loc))
3266
check_valid_parent()
3269
# absent; should not occur anywhere else
3270
this_tree_map[file_id] = None, this_path
3271
elif minikind == 'r':
3272
# relocation, must occur at expected location
3273
this_tree_map[file_id] = tree_state[1], this_path
3275
this_tree_map[file_id] = this_path, this_path
3276
check_valid_parent()
3277
if absent_positions == tree_count:
3278
raise AssertionError(
3279
"entry %r has no data for any tree." % (entry,))
3280
if self._id_index is not None:
3281
for file_id, entry_keys in self._id_index.iteritems():
3282
for entry_key in entry_keys:
3283
if entry_key[2] != file_id:
3284
raise AssertionError(
3285
'file_id %r did not match entry key %s'
3286
% (file_id, entry_key))
3287
if len(entry_keys) != len(set(entry_keys)):
3288
raise AssertionError(
3289
'id_index contained non-unique data for %s'
3292
def _wipe_state(self):
3293
"""Forget all state information about the dirstate."""
3294
self._header_state = DirState.NOT_IN_MEMORY
3295
self._dirblock_state = DirState.NOT_IN_MEMORY
3296
self._changes_aborted = False
3299
self._dirblocks = []
3300
self._id_index = None
3301
self._packed_stat_index = None
3302
self._end_of_header = None
3303
self._cutoff_time = None
3304
self._split_path_cache = {}
3306
def lock_read(self):
3307
"""Acquire a read lock on the dirstate."""
3308
if self._lock_token is not None:
3309
raise errors.LockContention(self._lock_token)
3310
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3311
# already in memory, we could read just the header and check for
3312
# any modification. If not modified, we can just leave things
3314
self._lock_token = lock.ReadLock(self._filename)
3315
self._lock_state = 'r'
3316
self._state_file = self._lock_token.f
3319
def lock_write(self):
3320
"""Acquire a write lock on the dirstate."""
3321
if self._lock_token is not None:
3322
raise errors.LockContention(self._lock_token)
3323
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3324
# already in memory, we could read just the header and check for
3325
# any modification. If not modified, we can just leave things
3327
self._lock_token = lock.WriteLock(self._filename)
3328
self._lock_state = 'w'
3329
self._state_file = self._lock_token.f
3333
"""Drop any locks held on the dirstate."""
3334
if self._lock_token is None:
3335
raise errors.LockNotHeld(self)
3336
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3337
# already in memory, we could read just the header and check for
3338
# any modification. If not modified, we can just leave things
3340
self._state_file = None
3341
self._lock_state = None
3342
self._lock_token.unlock()
3343
self._lock_token = None
3344
self._split_path_cache = {}
3346
def _requires_lock(self):
3347
"""Check that a lock is currently held by someone on the dirstate."""
3348
if not self._lock_token:
3349
raise errors.ObjectNotLocked(self)
3352
def py_update_entry(state, entry, abspath, stat_value,
3353
_stat_to_minikind=DirState._stat_to_minikind):
3354
"""Update the entry based on what is actually on disk.
3356
This function only calculates the sha if it needs to - if the entry is
3357
uncachable, or clearly different to the first parent's entry, no sha
3358
is calculated, and None is returned.
3360
:param state: The dirstate this entry is in.
3361
:param entry: This is the dirblock entry for the file in question.
3362
:param abspath: The path on disk for this file.
3363
:param stat_value: The stat value done on the path.
3364
:return: None, or The sha1 hexdigest of the file (40 bytes) or link
3365
target of a symlink.
3368
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
3372
packed_stat = pack_stat(stat_value)
3373
(saved_minikind, saved_link_or_sha1, saved_file_size,
3374
saved_executable, saved_packed_stat) = entry[1][0]
3376
if minikind == 'd' and saved_minikind == 't':
3378
if (minikind == saved_minikind
3379
and packed_stat == saved_packed_stat):
3380
# The stat hasn't changed since we saved, so we can re-use the
3385
# size should also be in packed_stat
3386
if saved_file_size == stat_value.st_size:
3387
return saved_link_or_sha1
3389
# If we have gotten this far, that means that we need to actually
3390
# process this entry.
3394
executable = state._is_executable(stat_value.st_mode,
3396
if state._cutoff_time is None:
3397
state._sha_cutoff_time()
3398
if (stat_value.st_mtime < state._cutoff_time
3399
and stat_value.st_ctime < state._cutoff_time
3400
and len(entry[1]) > 1
3401
and entry[1][1][0] != 'a'):
3402
# Could check for size changes for further optimised
3403
# avoidance of sha1's. However the most prominent case of
3404
# over-shaing is during initial add, which this catches.
3405
# Besides, if content filtering happens, size and sha
3406
# are calculated at the same time, so checking just the size
3407
# gains nothing w.r.t. performance.
3408
link_or_sha1 = state._sha1_file(abspath)
3409
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
3410
executable, packed_stat)
3412
entry[1][0] = ('f', '', stat_value.st_size,
3413
executable, DirState.NULLSTAT)
3414
worth_saving = False
3415
elif minikind == 'd':
3417
entry[1][0] = ('d', '', 0, False, packed_stat)
3418
if saved_minikind != 'd':
3419
# This changed from something into a directory. Make sure we
3420
# have a directory block for it. This doesn't happen very
3421
# often, so this doesn't have to be super fast.
3422
block_index, entry_index, dir_present, file_present = \
3423
state._get_block_entry_index(entry[0][0], entry[0][1], 0)
3424
state._ensure_block(block_index, entry_index,
3425
osutils.pathjoin(entry[0][0], entry[0][1]))
3427
worth_saving = False
3428
elif minikind == 'l':
3429
if saved_minikind == 'l':
3430
worth_saving = False
3431
link_or_sha1 = state._read_link(abspath, saved_link_or_sha1)
3432
if state._cutoff_time is None:
3433
state._sha_cutoff_time()
3434
if (stat_value.st_mtime < state._cutoff_time
3435
and stat_value.st_ctime < state._cutoff_time):
3436
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
3439
entry[1][0] = ('l', '', stat_value.st_size,
3440
False, DirState.NULLSTAT)
3442
state._mark_modified([entry])
3446
class ProcessEntryPython(object):
3448
__slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id",
3449
"last_source_parent", "last_target_parent", "include_unchanged",
3450
"partial", "use_filesystem_for_exec", "utf8_decode",
3451
"searched_specific_files", "search_specific_files",
3452
"searched_exact_paths", "search_specific_file_parents", "seen_ids",
3453
"state", "source_index", "target_index", "want_unversioned", "tree"]
3455
def __init__(self, include_unchanged, use_filesystem_for_exec,
3456
search_specific_files, state, source_index, target_index,
3457
want_unversioned, tree):
3458
self.old_dirname_to_file_id = {}
3459
self.new_dirname_to_file_id = {}
3460
# Are we doing a partial iter_changes?
3461
self.partial = search_specific_files != set([''])
3462
# Using a list so that we can access the values and change them in
3463
# nested scope. Each one is [path, file_id, entry]
3464
self.last_source_parent = [None, None]
3465
self.last_target_parent = [None, None]
3466
self.include_unchanged = include_unchanged
3467
self.use_filesystem_for_exec = use_filesystem_for_exec
3468
self.utf8_decode = cache_utf8._utf8_decode
3469
# for all search_indexs in each path at or under each element of
3470
# search_specific_files, if the detail is relocated: add the id, and
3471
# add the relocated path as one to search if its not searched already.
3472
# If the detail is not relocated, add the id.
3473
self.searched_specific_files = set()
3474
# When we search exact paths without expanding downwards, we record
3476
self.searched_exact_paths = set()
3477
self.search_specific_files = search_specific_files
3478
# The parents up to the root of the paths we are searching.
3479
# After all normal paths are returned, these specific items are returned.
3480
self.search_specific_file_parents = set()
3481
# The ids we've sent out in the delta.
3482
self.seen_ids = set()
3484
self.source_index = source_index
3485
self.target_index = target_index
3486
if target_index != 0:
3487
# A lot of code in here depends on target_index == 0
3488
raise errors.BzrError('unsupported target index')
3489
self.want_unversioned = want_unversioned
3492
def _process_entry(self, entry, path_info, pathjoin=osutils.pathjoin):
3493
"""Compare an entry and real disk to generate delta information.
3495
:param path_info: top_relpath, basename, kind, lstat, abspath for
3496
the path of entry. If None, then the path is considered absent in
3497
the target (Perhaps we should pass in a concrete entry for this ?)
3498
Basename is returned as a utf8 string because we expect this
3499
tuple will be ignored, and don't want to take the time to
3501
:return: (iter_changes_result, changed). If the entry has not been
3502
handled then changed is None. Otherwise it is False if no content
3503
or metadata changes have occurred, and True if any content or
3504
metadata change has occurred. If self.include_unchanged is True then
3505
if changed is not None, iter_changes_result will always be a result
3506
tuple. Otherwise, iter_changes_result is None unless changed is
3509
if self.source_index is None:
3510
source_details = DirState.NULL_PARENT_DETAILS
3512
source_details = entry[1][self.source_index]
3513
target_details = entry[1][self.target_index]
3514
target_minikind = target_details[0]
3515
if path_info is not None and target_minikind in 'fdlt':
3516
if not (self.target_index == 0):
3517
raise AssertionError()
3518
link_or_sha1 = update_entry(self.state, entry,
3519
abspath=path_info[4], stat_value=path_info[3])
3520
# The entry may have been modified by update_entry
3521
target_details = entry[1][self.target_index]
3522
target_minikind = target_details[0]
3525
file_id = entry[0][2]
3526
source_minikind = source_details[0]
3527
if source_minikind in 'fdltr' and target_minikind in 'fdlt':
3528
# claimed content in both: diff
3529
# r | fdlt | | add source to search, add id path move and perform
3530
# | | | diff check on source-target
3531
# r | fdlt | a | dangling file that was present in the basis.
3533
if source_minikind in 'r':
3534
# add the source to the search path to find any children it
3535
# has. TODO ? : only add if it is a container ?
3536
if not osutils.is_inside_any(self.searched_specific_files,
3538
self.search_specific_files.add(source_details[1])
3539
# generate the old path; this is needed for stating later
3541
old_path = source_details[1]
3542
old_dirname, old_basename = os.path.split(old_path)
3543
path = pathjoin(entry[0][0], entry[0][1])
3544
old_entry = self.state._get_entry(self.source_index,
3546
# update the source details variable to be the real
3548
if old_entry == (None, None):
3549
raise errors.CorruptDirstate(self.state._filename,
3550
"entry '%s/%s' is considered renamed from %r"
3551
" but source does not exist\n"
3552
"entry: %s" % (entry[0][0], entry[0][1], old_path, entry))
3553
source_details = old_entry[1][self.source_index]
3554
source_minikind = source_details[0]
3556
old_dirname = entry[0][0]
3557
old_basename = entry[0][1]
3558
old_path = path = None
3559
if path_info is None:
3560
# the file is missing on disk, show as removed.
3561
content_change = True
3565
# source and target are both versioned and disk file is present.
3566
target_kind = path_info[2]
3567
if target_kind == 'directory':
3569
old_path = path = pathjoin(old_dirname, old_basename)
3570
self.new_dirname_to_file_id[path] = file_id
3571
if source_minikind != 'd':
3572
content_change = True
3574
# directories have no fingerprint
3575
content_change = False
3577
elif target_kind == 'file':
3578
if source_minikind != 'f':
3579
content_change = True
3581
# Check the sha. We can't just rely on the size as
3582
# content filtering may mean differ sizes actually
3583
# map to the same content
3584
if link_or_sha1 is None:
3586
statvalue, link_or_sha1 = \
3587
self.state._sha1_provider.stat_and_sha1(
3589
self.state._observed_sha1(entry, link_or_sha1,
3591
content_change = (link_or_sha1 != source_details[1])
3592
# Target details is updated at update_entry time
3593
if self.use_filesystem_for_exec:
3594
# We don't need S_ISREG here, because we are sure
3595
# we are dealing with a file.
3596
target_exec = bool(stat.S_IEXEC & path_info[3].st_mode)
3598
target_exec = target_details[3]
3599
elif target_kind == 'symlink':
3600
if source_minikind != 'l':
3601
content_change = True
3603
content_change = (link_or_sha1 != source_details[1])
3605
elif target_kind == 'tree-reference':
3606
if source_minikind != 't':
3607
content_change = True
3609
content_change = False
3613
path = pathjoin(old_dirname, old_basename)
3614
raise errors.BadFileKindError(path, path_info[2])
3615
if source_minikind == 'd':
3617
old_path = path = pathjoin(old_dirname, old_basename)
3618
self.old_dirname_to_file_id[old_path] = file_id
3619
# parent id is the entry for the path in the target tree
3620
if old_basename and old_dirname == self.last_source_parent[0]:
3621
source_parent_id = self.last_source_parent[1]
3624
source_parent_id = self.old_dirname_to_file_id[old_dirname]
3626
source_parent_entry = self.state._get_entry(self.source_index,
3627
path_utf8=old_dirname)
3628
source_parent_id = source_parent_entry[0][2]
3629
if source_parent_id == entry[0][2]:
3630
# This is the root, so the parent is None
3631
source_parent_id = None
3633
self.last_source_parent[0] = old_dirname
3634
self.last_source_parent[1] = source_parent_id
3635
new_dirname = entry[0][0]
3636
if entry[0][1] and new_dirname == self.last_target_parent[0]:
3637
target_parent_id = self.last_target_parent[1]
3640
target_parent_id = self.new_dirname_to_file_id[new_dirname]
3642
# TODO: We don't always need to do the lookup, because the
3643
# parent entry will be the same as the source entry.
3644
target_parent_entry = self.state._get_entry(self.target_index,
3645
path_utf8=new_dirname)
3646
if target_parent_entry == (None, None):
3647
raise AssertionError(
3648
"Could not find target parent in wt: %s\nparent of: %s"
3649
% (new_dirname, entry))
3650
target_parent_id = target_parent_entry[0][2]
3651
if target_parent_id == entry[0][2]:
3652
# This is the root, so the parent is None
3653
target_parent_id = None
3655
self.last_target_parent[0] = new_dirname
3656
self.last_target_parent[1] = target_parent_id
3658
source_exec = source_details[3]
3659
changed = (content_change
3660
or source_parent_id != target_parent_id
3661
or old_basename != entry[0][1]
3662
or source_exec != target_exec
3664
if not changed and not self.include_unchanged:
3667
if old_path is None:
3668
old_path = path = pathjoin(old_dirname, old_basename)
3669
old_path_u = self.utf8_decode(old_path)[0]
3672
old_path_u = self.utf8_decode(old_path)[0]
3673
if old_path == path:
3676
path_u = self.utf8_decode(path)[0]
3677
source_kind = DirState._minikind_to_kind[source_minikind]
3678
return (entry[0][2],
3679
(old_path_u, path_u),
3682
(source_parent_id, target_parent_id),
3683
(self.utf8_decode(old_basename)[0], self.utf8_decode(entry[0][1])[0]),
3684
(source_kind, target_kind),
3685
(source_exec, target_exec)), changed
3686
elif source_minikind in 'a' and target_minikind in 'fdlt':
3687
# looks like a new file
3688
path = pathjoin(entry[0][0], entry[0][1])
3689
# parent id is the entry for the path in the target tree
3690
# TODO: these are the same for an entire directory: cache em.
3691
parent_id = self.state._get_entry(self.target_index,
3692
path_utf8=entry[0][0])[0][2]
3693
if parent_id == entry[0][2]:
3695
if path_info is not None:
3697
if self.use_filesystem_for_exec:
3698
# We need S_ISREG here, because we aren't sure if this
3701
stat.S_ISREG(path_info[3].st_mode)
3702
and stat.S_IEXEC & path_info[3].st_mode)
3704
target_exec = target_details[3]
3705
return (entry[0][2],
3706
(None, self.utf8_decode(path)[0]),
3710
(None, self.utf8_decode(entry[0][1])[0]),
3711
(None, path_info[2]),
3712
(None, target_exec)), True
3714
# Its a missing file, report it as such.
3715
return (entry[0][2],
3716
(None, self.utf8_decode(path)[0]),
3720
(None, self.utf8_decode(entry[0][1])[0]),
3722
(None, False)), True
3723
elif source_minikind in 'fdlt' and target_minikind in 'a':
3724
# unversioned, possibly, or possibly not deleted: we dont care.
3725
# if its still on disk, *and* theres no other entry at this
3726
# path [we dont know this in this routine at the moment -
3727
# perhaps we should change this - then it would be an unknown.
3728
old_path = pathjoin(entry[0][0], entry[0][1])
3729
# parent id is the entry for the path in the target tree
3730
parent_id = self.state._get_entry(self.source_index, path_utf8=entry[0][0])[0][2]
3731
if parent_id == entry[0][2]:
3733
return (entry[0][2],
3734
(self.utf8_decode(old_path)[0], None),
3738
(self.utf8_decode(entry[0][1])[0], None),
3739
(DirState._minikind_to_kind[source_minikind], None),
3740
(source_details[3], None)), True
3741
elif source_minikind in 'fdlt' and target_minikind in 'r':
3742
# a rename; could be a true rename, or a rename inherited from
3743
# a renamed parent. TODO: handle this efficiently. Its not
3744
# common case to rename dirs though, so a correct but slow
3745
# implementation will do.
3746
if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
3747
self.search_specific_files.add(target_details[1])
3748
elif source_minikind in 'ra' and target_minikind in 'ra':
3749
# neither of the selected trees contain this file,
3750
# so skip over it. This is not currently directly tested, but
3751
# is indirectly via test_too_much.TestCommands.test_conflicts.
3754
raise AssertionError("don't know how to compare "
3755
"source_minikind=%r, target_minikind=%r"
3756
% (source_minikind, target_minikind))
3762
def _gather_result_for_consistency(self, result):
3763
"""Check a result we will yield to make sure we are consistent later.
3765
This gathers result's parents into a set to output later.
3767
:param result: A result tuple.
3769
if not self.partial or not result[0]:
3771
self.seen_ids.add(result[0])
3772
new_path = result[1][1]
3774
# Not the root and not a delete: queue up the parents of the path.
3775
self.search_specific_file_parents.update(
3776
osutils.parent_directories(new_path.encode('utf8')))
3777
# Add the root directory which parent_directories does not
3779
self.search_specific_file_parents.add('')
3781
def iter_changes(self):
3782
"""Iterate over the changes."""
3783
utf8_decode = cache_utf8._utf8_decode
3784
_cmp_by_dirs = cmp_by_dirs
3785
_process_entry = self._process_entry
3786
search_specific_files = self.search_specific_files
3787
searched_specific_files = self.searched_specific_files
3788
splitpath = osutils.splitpath
3790
# compare source_index and target_index at or under each element of search_specific_files.
3791
# follow the following comparison table. Note that we only want to do diff operations when
3792
# the target is fdl because thats when the walkdirs logic will have exposed the pathinfo
3796
# Source | Target | disk | action
3797
# r | fdlt | | add source to search, add id path move and perform
3798
# | | | diff check on source-target
3799
# r | fdlt | a | dangling file that was present in the basis.
3801
# r | a | | add source to search
3803
# r | r | | this path is present in a non-examined tree, skip.
3804
# r | r | a | this path is present in a non-examined tree, skip.
3805
# a | fdlt | | add new id
3806
# a | fdlt | a | dangling locally added file, skip
3807
# a | a | | not present in either tree, skip
3808
# a | a | a | not present in any tree, skip
3809
# a | r | | not present in either tree at this path, skip as it
3810
# | | | may not be selected by the users list of paths.
3811
# a | r | a | not present in either tree at this path, skip as it
3812
# | | | may not be selected by the users list of paths.
3813
# fdlt | fdlt | | content in both: diff them
3814
# fdlt | fdlt | a | deleted locally, but not unversioned - show as deleted ?
3815
# fdlt | a | | unversioned: output deleted id for now
3816
# fdlt | a | a | unversioned and deleted: output deleted id
3817
# fdlt | r | | relocated in this tree, so add target to search.
3818
# | | | Dont diff, we will see an r,fd; pair when we reach
3819
# | | | this id at the other path.
3820
# fdlt | r | a | relocated in this tree, so add target to search.
3821
# | | | Dont diff, we will see an r,fd; pair when we reach
3822
# | | | this id at the other path.
3824
# TODO: jam 20070516 - Avoid the _get_entry lookup overhead by
3825
# keeping a cache of directories that we have seen.
3827
while search_specific_files:
3828
# TODO: the pending list should be lexically sorted? the
3829
# interface doesn't require it.
3830
current_root = search_specific_files.pop()
3831
current_root_unicode = current_root.decode('utf8')
3832
searched_specific_files.add(current_root)
3833
# process the entries for this containing directory: the rest will be
3834
# found by their parents recursively.
3835
root_entries = self.state._entries_for_path(current_root)
3836
root_abspath = self.tree.abspath(current_root_unicode)
3838
root_stat = os.lstat(root_abspath)
3840
if e.errno == errno.ENOENT:
3841
# the path does not exist: let _process_entry know that.
3842
root_dir_info = None
3844
# some other random error: hand it up.
3847
root_dir_info = ('', current_root,
3848
osutils.file_kind_from_stat_mode(root_stat.st_mode), root_stat,
3850
if root_dir_info[2] == 'directory':
3851
if self.tree._directory_is_tree_reference(
3852
current_root.decode('utf8')):
3853
root_dir_info = root_dir_info[:2] + \
3854
('tree-reference',) + root_dir_info[3:]
3856
if not root_entries and not root_dir_info:
3857
# this specified path is not present at all, skip it.
3859
path_handled = False
3860
for entry in root_entries:
3861
result, changed = _process_entry(entry, root_dir_info)
3862
if changed is not None:
3865
self._gather_result_for_consistency(result)
3866
if changed or self.include_unchanged:
3868
if self.want_unversioned and not path_handled and root_dir_info:
3869
new_executable = bool(
3870
stat.S_ISREG(root_dir_info[3].st_mode)
3871
and stat.S_IEXEC & root_dir_info[3].st_mode)
3873
(None, current_root_unicode),
3877
(None, splitpath(current_root_unicode)[-1]),
3878
(None, root_dir_info[2]),
3879
(None, new_executable)
3881
initial_key = (current_root, '', '')
3882
block_index, _ = self.state._find_block_index_from_key(initial_key)
3883
if block_index == 0:
3884
# we have processed the total root already, but because the
3885
# initial key matched it we should skip it here.
3887
if root_dir_info and root_dir_info[2] == 'tree-reference':
3888
current_dir_info = None
3890
dir_iterator = osutils._walkdirs_utf8(root_abspath, prefix=current_root)
3892
current_dir_info = dir_iterator.next()
3894
# on win32, python2.4 has e.errno == ERROR_DIRECTORY, but
3895
# python 2.5 has e.errno == EINVAL,
3896
# and e.winerror == ERROR_DIRECTORY
3897
e_winerror = getattr(e, 'winerror', None)
3898
win_errors = (ERROR_DIRECTORY, ERROR_PATH_NOT_FOUND)
3899
# there may be directories in the inventory even though
3900
# this path is not a file on disk: so mark it as end of
3902
if e.errno in (errno.ENOENT, errno.ENOTDIR, errno.EINVAL):
3903
current_dir_info = None
3904
elif (sys.platform == 'win32'
3905
and (e.errno in win_errors
3906
or e_winerror in win_errors)):
3907
current_dir_info = None
3911
if current_dir_info[0][0] == '':
3912
# remove .bzr from iteration
3913
bzr_index = bisect.bisect_left(current_dir_info[1], ('.bzr',))
3914
if current_dir_info[1][bzr_index][0] != '.bzr':
3915
raise AssertionError()
3916
del current_dir_info[1][bzr_index]
3917
# walk until both the directory listing and the versioned metadata
3919
if (block_index < len(self.state._dirblocks) and
3920
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3921
current_block = self.state._dirblocks[block_index]
3923
current_block = None
3924
while (current_dir_info is not None or
3925
current_block is not None):
3926
if (current_dir_info and current_block
3927
and current_dir_info[0][0] != current_block[0]):
3928
if _cmp_by_dirs(current_dir_info[0][0], current_block[0]) < 0:
3929
# filesystem data refers to paths not covered by the dirblock.
3930
# this has two possibilities:
3931
# A) it is versioned but empty, so there is no block for it
3932
# B) it is not versioned.
3934
# if (A) then we need to recurse into it to check for
3935
# new unknown files or directories.
3936
# if (B) then we should ignore it, because we don't
3937
# recurse into unknown directories.
3939
while path_index < len(current_dir_info[1]):
3940
current_path_info = current_dir_info[1][path_index]
3941
if self.want_unversioned:
3942
if current_path_info[2] == 'directory':
3943
if self.tree._directory_is_tree_reference(
3944
current_path_info[0].decode('utf8')):
3945
current_path_info = current_path_info[:2] + \
3946
('tree-reference',) + current_path_info[3:]
3947
new_executable = bool(
3948
stat.S_ISREG(current_path_info[3].st_mode)
3949
and stat.S_IEXEC & current_path_info[3].st_mode)
3951
(None, utf8_decode(current_path_info[0])[0]),
3955
(None, utf8_decode(current_path_info[1])[0]),
3956
(None, current_path_info[2]),
3957
(None, new_executable))
3958
# dont descend into this unversioned path if it is
3960
if current_path_info[2] in ('directory',
3962
del current_dir_info[1][path_index]
3966
# This dir info has been handled, go to the next
3968
current_dir_info = dir_iterator.next()
3969
except StopIteration:
3970
current_dir_info = None
3972
# We have a dirblock entry for this location, but there
3973
# is no filesystem path for this. This is most likely
3974
# because a directory was removed from the disk.
3975
# We don't have to report the missing directory,
3976
# because that should have already been handled, but we
3977
# need to handle all of the files that are contained
3979
for current_entry in current_block[1]:
3980
# entry referring to file not present on disk.
3981
# advance the entry only, after processing.
3982
result, changed = _process_entry(current_entry, None)
3983
if changed is not None:
3985
self._gather_result_for_consistency(result)
3986
if changed or self.include_unchanged:
3989
if (block_index < len(self.state._dirblocks) and
3990
osutils.is_inside(current_root,
3991
self.state._dirblocks[block_index][0])):
3992
current_block = self.state._dirblocks[block_index]
3994
current_block = None
3997
if current_block and entry_index < len(current_block[1]):
3998
current_entry = current_block[1][entry_index]
4000
current_entry = None
4001
advance_entry = True
4003
if current_dir_info and path_index < len(current_dir_info[1]):
4004
current_path_info = current_dir_info[1][path_index]
4005
if current_path_info[2] == 'directory':
4006
if self.tree._directory_is_tree_reference(
4007
current_path_info[0].decode('utf8')):
4008
current_path_info = current_path_info[:2] + \
4009
('tree-reference',) + current_path_info[3:]
4011
current_path_info = None
4013
path_handled = False
4014
while (current_entry is not None or
4015
current_path_info is not None):
4016
if current_entry is None:
4017
# the check for path_handled when the path is advanced
4018
# will yield this path if needed.
4020
elif current_path_info is None:
4021
# no path is fine: the per entry code will handle it.
4022
result, changed = _process_entry(current_entry, current_path_info)
4023
if changed is not None:
4025
self._gather_result_for_consistency(result)
4026
if changed or self.include_unchanged:
4028
elif (current_entry[0][1] != current_path_info[1]
4029
or current_entry[1][self.target_index][0] in 'ar'):
4030
# The current path on disk doesn't match the dirblock
4031
# record. Either the dirblock is marked as absent, or
4032
# the file on disk is not present at all in the
4033
# dirblock. Either way, report about the dirblock
4034
# entry, and let other code handle the filesystem one.
4036
# Compare the basename for these files to determine
4038
if current_path_info[1] < current_entry[0][1]:
4039
# extra file on disk: pass for now, but only
4040
# increment the path, not the entry
4041
advance_entry = False
4043
# entry referring to file not present on disk.
4044
# advance the entry only, after processing.
4045
result, changed = _process_entry(current_entry, None)
4046
if changed is not None:
4048
self._gather_result_for_consistency(result)
4049
if changed or self.include_unchanged:
4051
advance_path = False
4053
result, changed = _process_entry(current_entry, current_path_info)
4054
if changed is not None:
4057
self._gather_result_for_consistency(result)
4058
if changed or self.include_unchanged:
4060
if advance_entry and current_entry is not None:
4062
if entry_index < len(current_block[1]):
4063
current_entry = current_block[1][entry_index]
4065
current_entry = None
4067
advance_entry = True # reset the advance flaga
4068
if advance_path and current_path_info is not None:
4069
if not path_handled:
4070
# unversioned in all regards
4071
if self.want_unversioned:
4072
new_executable = bool(
4073
stat.S_ISREG(current_path_info[3].st_mode)
4074
and stat.S_IEXEC & current_path_info[3].st_mode)
4076
relpath_unicode = utf8_decode(current_path_info[0])[0]
4077
except UnicodeDecodeError:
4078
raise errors.BadFilenameEncoding(
4079
current_path_info[0], osutils._fs_enc)
4081
(None, relpath_unicode),
4085
(None, utf8_decode(current_path_info[1])[0]),
4086
(None, current_path_info[2]),
4087
(None, new_executable))
4088
# dont descend into this unversioned path if it is
4090
if current_path_info[2] in ('directory'):
4091
del current_dir_info[1][path_index]
4093
# dont descend the disk iterator into any tree
4095
if current_path_info[2] == 'tree-reference':
4096
del current_dir_info[1][path_index]
4099
if path_index < len(current_dir_info[1]):
4100
current_path_info = current_dir_info[1][path_index]
4101
if current_path_info[2] == 'directory':
4102
if self.tree._directory_is_tree_reference(
4103
current_path_info[0].decode('utf8')):
4104
current_path_info = current_path_info[:2] + \
4105
('tree-reference',) + current_path_info[3:]
4107
current_path_info = None
4108
path_handled = False
4110
advance_path = True # reset the advance flagg.
4111
if current_block is not None:
4113
if (block_index < len(self.state._dirblocks) and
4114
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
4115
current_block = self.state._dirblocks[block_index]
4117
current_block = None
4118
if current_dir_info is not None:
4120
current_dir_info = dir_iterator.next()
4121
except StopIteration:
4122
current_dir_info = None
4123
for result in self._iter_specific_file_parents():
4126
def _iter_specific_file_parents(self):
4127
"""Iter over the specific file parents."""
4128
while self.search_specific_file_parents:
4129
# Process the parent directories for the paths we were iterating.
4130
# Even in extremely large trees this should be modest, so currently
4131
# no attempt is made to optimise.
4132
path_utf8 = self.search_specific_file_parents.pop()
4133
if osutils.is_inside_any(self.searched_specific_files, path_utf8):
4134
# We've examined this path.
4136
if path_utf8 in self.searched_exact_paths:
4137
# We've examined this path.
4139
path_entries = self.state._entries_for_path(path_utf8)
4140
# We need either one or two entries. If the path in
4141
# self.target_index has moved (so the entry in source_index is in
4142
# 'ar') then we need to also look for the entry for this path in
4143
# self.source_index, to output the appropriate delete-or-rename.
4144
selected_entries = []
4146
for candidate_entry in path_entries:
4147
# Find entries present in target at this path:
4148
if candidate_entry[1][self.target_index][0] not in 'ar':
4150
selected_entries.append(candidate_entry)
4151
# Find entries present in source at this path:
4152
elif (self.source_index is not None and
4153
candidate_entry[1][self.source_index][0] not in 'ar'):
4155
if candidate_entry[1][self.target_index][0] == 'a':
4156
# Deleted, emit it here.
4157
selected_entries.append(candidate_entry)
4159
# renamed, emit it when we process the directory it
4161
self.search_specific_file_parents.add(
4162
candidate_entry[1][self.target_index][1])
4164
raise AssertionError(
4165
"Missing entry for specific path parent %r, %r" % (
4166
path_utf8, path_entries))
4167
path_info = self._path_info(path_utf8, path_utf8.decode('utf8'))
4168
for entry in selected_entries:
4169
if entry[0][2] in self.seen_ids:
4171
result, changed = self._process_entry(entry, path_info)
4173
raise AssertionError(
4174
"Got entry<->path mismatch for specific path "
4175
"%r entry %r path_info %r " % (
4176
path_utf8, entry, path_info))
4177
# Only include changes - we're outside the users requested
4180
self._gather_result_for_consistency(result)
4181
if (result[6][0] == 'directory' and
4182
result[6][1] != 'directory'):
4183
# This stopped being a directory, the old children have
4185
if entry[1][self.source_index][0] == 'r':
4186
# renamed, take the source path
4187
entry_path_utf8 = entry[1][self.source_index][1]
4189
entry_path_utf8 = path_utf8
4190
initial_key = (entry_path_utf8, '', '')
4191
block_index, _ = self.state._find_block_index_from_key(
4193
if block_index == 0:
4194
# The children of the root are in block index 1.
4196
current_block = None
4197
if block_index < len(self.state._dirblocks):
4198
current_block = self.state._dirblocks[block_index]
4199
if not osutils.is_inside(
4200
entry_path_utf8, current_block[0]):
4201
# No entries for this directory at all.
4202
current_block = None
4203
if current_block is not None:
4204
for entry in current_block[1]:
4205
if entry[1][self.source_index][0] in 'ar':
4206
# Not in the source tree, so doesn't have to be
4209
# Path of the entry itself.
4211
self.search_specific_file_parents.add(
4212
osutils.pathjoin(*entry[0][:2]))
4213
if changed or self.include_unchanged:
4215
self.searched_exact_paths.add(path_utf8)
4217
def _path_info(self, utf8_path, unicode_path):
4218
"""Generate path_info for unicode_path.
4220
:return: None if unicode_path does not exist, or a path_info tuple.
4222
abspath = self.tree.abspath(unicode_path)
4224
stat = os.lstat(abspath)
4226
if e.errno == errno.ENOENT:
4227
# the path does not exist.
4231
utf8_basename = utf8_path.rsplit('/', 1)[-1]
4232
dir_info = (utf8_path, utf8_basename,
4233
osutils.file_kind_from_stat_mode(stat.st_mode), stat,
4235
if dir_info[2] == 'directory':
4236
if self.tree._directory_is_tree_reference(
4238
self.root_dir_info = self.root_dir_info[:2] + \
4239
('tree-reference',) + self.root_dir_info[3:]
4243
# Try to load the compiled form if possible
4245
from bzrlib._dirstate_helpers_pyx import (
4252
ProcessEntryC as _process_entry,
4253
update_entry as update_entry,
4255
except ImportError, e:
4256
osutils.failed_to_load_extension(e)
4257
from bzrlib._dirstate_helpers_py import (
4265
# FIXME: It would be nice to be able to track moved lines so that the
4266
# corresponding python code can be moved to the _dirstate_helpers_py
4267
# module. I don't want to break the history for this important piece of
4268
# code so I left the code here -- vila 20090622
4269
update_entry = py_update_entry
4270
_process_entry = ProcessEntryPython