1
# Copyright (C) 2006-2011 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
25
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
28
WHOLE_NUMBER = {digit}, digit;
30
REVISION_ID = a non-empty utf8 string;
32
dirstate format = header line, full checksum, row count, parent details,
33
ghost_details, entries;
34
header line = "#bazaar dirstate flat format 3", NL;
35
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
36
row count = "num_entries: ", WHOLE_NUMBER, NL;
37
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
38
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
40
entry = entry_key, current_entry_details, {parent_entry_details};
41
entry_key = dirname, basename, fileid;
42
current_entry_details = common_entry_details, working_entry_details;
43
parent_entry_details = common_entry_details, history_entry_details;
44
common_entry_details = MINIKIND, fingerprint, size, executable
45
working_entry_details = packed_stat
46
history_entry_details = REVISION_ID;
49
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
51
Given this definition, the following is useful to know::
53
entry (aka row) - all the data for a given key.
54
entry[0]: The key (dirname, basename, fileid)
58
entry[1]: The tree(s) data for this path and id combination.
59
entry[1][0]: The current tree
60
entry[1][1]: The second tree
62
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate::
64
entry[1][0][0]: minikind
65
entry[1][0][1]: fingerprint
67
entry[1][0][3]: executable
68
entry[1][0][4]: packed_stat
72
entry[1][1][4]: revision_id
74
There may be multiple rows at the root, one per id present in the root, so the
75
in memory root row is now::
77
self._dirblocks[0] -> ('', [entry ...]),
79
and the entries in there are::
83
entries[0][2]: file_id
84
entries[1][0]: The tree data for the current tree for this fileid at /
89
'r' is a relocated entry: This path is not present in this tree with this
90
id, but the id can be found at another location. The fingerprint is
91
used to point to the target location.
92
'a' is an absent entry: In that tree the id is not present at this path.
93
'd' is a directory entry: This path in this tree is a directory with the
94
current file id. There is no fingerprint for directories.
95
'f' is a file entry: As for directory, but it's a file. The fingerprint is
96
the sha1 value of the file's canonical form, i.e. after any read
97
filters have been applied to the convenience form stored in the working
99
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is
101
't' is a reference to a nested subtree; the fingerprint is the referenced
106
The entries on disk and in memory are ordered according to the following keys::
108
directory, as a list of components
112
--- Format 1 had the following different definition: ---
116
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
117
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
119
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
120
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
123
PARENT ROW's are emitted for every parent that is not in the ghosts details
124
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
125
each row will have a PARENT ROW for foo and baz, but not for bar.
128
In any tree, a kind of 'moved' indicates that the fingerprint field
129
(which we treat as opaque data specific to the 'kind' anyway) has the
130
details for the id of this row in that tree.
132
I'm strongly tempted to add a id->path index as well, but I think that
133
where we need id->path mapping; we also usually read the whole file, so
134
I'm going to skip that for the moment, as we have the ability to locate
135
via bisect any path in any tree, and if we lookup things by path, we can
136
accumulate an id->path mapping as we go, which will tend to match what we
139
I plan to implement this asap, so please speak up now to alter/tweak the
140
design - and once we stabilise on this, I'll update the wiki page for
143
The rationale for all this is that we want fast operations for the
144
common case (diff/status/commit/merge on all files) and extremely fast
145
operations for the less common but still occurs a lot status/diff/commit
146
on specific files). Operations on specific files involve a scan for all
147
the children of a path, *in every involved tree*, which the current
148
format did not accommodate.
152
1. Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
153
2. fall back current object model as needed.
154
3. scale usably to the largest trees known today - say 50K entries. (mozilla
155
is an example of this)
160
Eventually reuse dirstate objects across locks IFF the dirstate file has not
161
been modified, but will require that we flush/ignore cached stat-hit data
162
because we won't want to restat all files on disk just because a lock was
163
acquired, yet we cannot trust the data after the previous lock was released.
165
Memory representation::
167
vector of all directories, and vector of the childen ?
169
root_entrie = (direntry for root, [parent_direntries_for_root]),
171
('', ['data for achild', 'data for bchild', 'data for cchild'])
172
('dir', ['achild', 'cchild', 'echild'])
174
- single bisect to find N subtrees from a path spec
175
- in-order for serialisation - this is 'dirblock' grouping.
176
- insertion of a file '/a' affects only the '/' child-vector, that is, to
177
insert 10K elements from scratch does not generates O(N^2) memoves of a
178
single vector, rather each individual, which tends to be limited to a
179
manageable number. Will scale badly on trees with 10K entries in a
180
single directory. compare with Inventory.InventoryDirectory which has
181
a dictionary for the children. No bisect capability, can only probe for
182
exact matches, or grab all elements and sort.
183
- What's the risk of error here? Once we have the base format being processed
184
we should have a net win regardless of optimality. So we are going to
185
go with what seems reasonable.
189
Maybe we should do a test profile of the core structure - 10K simulated
190
searches/lookups/etc?
192
Objects for each row?
193
The lifetime of Dirstate objects is current per lock, but see above for
194
possible extensions. The lifetime of a row from a dirstate is expected to be
195
very short in the optimistic case: which we are optimising for. For instance,
196
subtree status will determine from analysis of the disk data what rows need to
197
be examined at all, and will be able to determine from a single row whether
198
that file has altered or not, so we are aiming to process tens of thousands of
199
entries each second within the dirstate context, before exposing anything to
200
the larger codebase. This suggests we want the time for a single file
201
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
202
processed, and to scale to 100 thousand we'll another order of magnitude to do
203
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
204
the file on disk, and then immediately discard, the overhead of object creation
205
becomes a significant cost.
207
Figures: Creating a tuple from 3 elements was profiled at 0.0625
208
microseconds, whereas creating a object which is subclassed from tuple was
209
0.500 microseconds, and creating an object with 3 elements and slots was 3
210
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
211
down to 10 microseconds for the total processing - having 33% of that be object
212
creation is a huge overhead. There is a potential cost in using tuples within
213
each row which is that the conditional code to do comparisons may be slower
214
than method invocation, but method invocation is known to be slow due to stack
215
frame creation, so avoiding methods in these tight inner loops in unfortunately
216
desirable. We can consider a pyrex version of this with objects in future if
226
from stat import S_IEXEC
247
# This is the Windows equivalent of ENOTDIR
248
# It is defined in pywin32.winerror, but we don't want a strong dependency for
249
# just an error code.
250
ERROR_PATH_NOT_FOUND = 3
251
ERROR_DIRECTORY = 267
254
if not getattr(struct, '_compile', None):
255
# Cannot pre-compile the dirstate pack_stat
256
def pack_stat(st, _encode=binascii.b2a_base64, _pack=struct.pack):
257
"""Convert stat values into a packed representation."""
258
return _encode(_pack('>LLLLLL', st.st_size, int(st.st_mtime),
259
int(st.st_ctime), st.st_dev, st.st_ino & 0xFFFFFFFF,
262
# compile the struct compiler we need, so as to only do it once
263
from _struct import Struct
264
_compiled_pack = Struct('>LLLLLL').pack
265
def pack_stat(st, _encode=binascii.b2a_base64, _pack=_compiled_pack):
266
"""Convert stat values into a packed representation."""
267
# jam 20060614 it isn't really worth removing more entries if we
268
# are going to leave it in packed form.
269
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
270
# With all entries, filesize is 5.9M and read time is maybe 280ms
271
# well within the noise margin
273
# base64 encoding always adds a final newline, so strip it off
274
# The current version
275
return _encode(_pack(st.st_size, int(st.st_mtime), int(st.st_ctime),
276
st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]
277
# This is 0.060s / 1.520s faster by not encoding as much information
278
# return _encode(_pack('>LL', int(st.st_mtime), st.st_mode))[:-1]
279
# This is not strictly faster than _encode(_pack())[:-1]
280
# return '%X.%X.%X.%X.%X.%X' % (
281
# st.st_size, int(st.st_mtime), int(st.st_ctime),
282
# st.st_dev, st.st_ino, st.st_mode)
283
# Similar to the _encode(_pack('>LL'))
284
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
287
def _unpack_stat(packed_stat):
288
"""Turn a packed_stat back into the stat fields.
290
This is meant as a debugging tool, should not be used in real code.
292
(st_size, st_mtime, st_ctime, st_dev, st_ino,
293
st_mode) = struct.unpack('>LLLLLL', binascii.a2b_base64(packed_stat))
294
return dict(st_size=st_size, st_mtime=st_mtime, st_ctime=st_ctime,
295
st_dev=st_dev, st_ino=st_ino, st_mode=st_mode)
298
class SHA1Provider(object):
299
"""An interface for getting sha1s of a file."""
301
def sha1(self, abspath):
302
"""Return the sha1 of a file given its absolute path.
304
:param abspath: May be a filesystem encoded absolute path
307
raise NotImplementedError(self.sha1)
309
def stat_and_sha1(self, abspath):
310
"""Return the stat and sha1 of a file given its absolute path.
312
:param abspath: May be a filesystem encoded absolute path
315
Note: the stat should be the stat of the physical file
316
while the sha may be the sha of its canonical content.
318
raise NotImplementedError(self.stat_and_sha1)
321
class DefaultSHA1Provider(SHA1Provider):
322
"""A SHA1Provider that reads directly from the filesystem."""
324
def sha1(self, abspath):
325
"""Return the sha1 of a file given its absolute path."""
326
return osutils.sha_file_by_name(abspath)
328
def stat_and_sha1(self, abspath):
329
"""Return the stat and sha1 of a file given its absolute path."""
330
file_obj = file(abspath, 'rb')
332
statvalue = os.fstat(file_obj.fileno())
333
sha1 = osutils.sha_file(file_obj)
336
return statvalue, sha1
339
class DirState(object):
340
"""Record directory and metadata state for fast access.
342
A dirstate is a specialised data structure for managing local working
343
tree state information. Its not yet well defined whether it is platform
344
specific, and if it is how we detect/parameterize that.
346
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
347
Unlike most bzr disk formats, DirStates must be locked for reading, using
348
lock_read. (This is an os file lock internally.) This is necessary
349
because the file can be rewritten in place.
351
DirStates must be explicitly written with save() to commit changes; just
352
unlocking them does not write the changes to disk.
355
_kind_to_minikind = {
361
'tree-reference': 't',
363
_minikind_to_kind = {
369
't': 'tree-reference',
371
_stat_to_minikind = {
376
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
377
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
380
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
381
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
382
# that we are sure a single record will always fit.
383
BISECT_PAGE_SIZE = 4096
386
IN_MEMORY_UNMODIFIED = 1
387
IN_MEMORY_MODIFIED = 2
388
IN_MEMORY_HASH_MODIFIED = 3 # Only hash-cache updates
390
# A pack_stat (the x's) that is just noise and will never match the output
393
NULL_PARENT_DETAILS = static_tuple.StaticTuple('a', '', 0, False, '')
395
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
396
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
398
def __init__(self, path, sha1_provider, worth_saving_limit=0):
399
"""Create a DirState object.
401
:param path: The path at which the dirstate file on disk should live.
402
:param sha1_provider: an object meeting the SHA1Provider interface.
403
:param worth_saving_limit: when the exact number of hash changed
404
entries is known, only bother saving the dirstate if more than
405
this count of entries have changed.
406
-1 means never save hash changes, 0 means always save hash changes.
408
# _header_state and _dirblock_state represent the current state
409
# of the dirstate metadata and the per-row data respectiely.
410
# NOT_IN_MEMORY indicates that no data is in memory
411
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
412
# is the same as is on disk
413
# IN_MEMORY_MODIFIED indicates that we have a modified version
414
# of what is on disk.
415
# In future we will add more granularity, for instance _dirblock_state
416
# will probably support partially-in-memory as a separate variable,
417
# allowing for partially-in-memory unmodified and partially-in-memory
419
self._header_state = DirState.NOT_IN_MEMORY
420
self._dirblock_state = DirState.NOT_IN_MEMORY
421
# If true, an error has been detected while updating the dirstate, and
422
# for safety we're not going to commit to disk.
423
self._changes_aborted = False
427
self._state_file = None
428
self._filename = path
429
self._lock_token = None
430
self._lock_state = None
431
self._id_index = None
432
# a map from packed_stat to sha's.
433
self._packed_stat_index = None
434
self._end_of_header = None
435
self._cutoff_time = None
436
self._split_path_cache = {}
437
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
438
self._sha1_provider = sha1_provider
439
if 'hashcache' in debug.debug_flags:
440
self._sha1_file = self._sha1_file_and_mutter
442
self._sha1_file = self._sha1_provider.sha1
443
# These two attributes provide a simple cache for lookups into the
444
# dirstate in-memory vectors. By probing respectively for the last
445
# block, and for the next entry, we save nearly 2 bisections per path
447
self._last_block_index = None
448
self._last_entry_index = None
449
# The set of known hash changes
450
self._known_hash_changes = set()
451
# How many hash changed entries can we have without saving
452
self._worth_saving_limit = worth_saving_limit
453
self._config_stack = config.LocationStack(urlutils.local_path_to_url(
458
(self.__class__.__name__, self._filename)
460
def _mark_modified(self, hash_changed_entries=None, header_modified=False):
461
"""Mark this dirstate as modified.
463
:param hash_changed_entries: if non-None, mark just these entries as
464
having their hash modified.
465
:param header_modified: mark the header modified as well, not just the
468
#trace.mutter_callsite(3, "modified hash entries: %s", hash_changed_entries)
469
if hash_changed_entries:
470
self._known_hash_changes.update([e[0] for e in hash_changed_entries])
471
if self._dirblock_state in (DirState.NOT_IN_MEMORY,
472
DirState.IN_MEMORY_UNMODIFIED):
473
# If the dirstate is already marked a IN_MEMORY_MODIFIED, then
474
# that takes precedence.
475
self._dirblock_state = DirState.IN_MEMORY_HASH_MODIFIED
477
# TODO: Since we now have a IN_MEMORY_HASH_MODIFIED state, we
478
# should fail noisily if someone tries to set
479
# IN_MEMORY_MODIFIED but we don't have a write-lock!
480
# We don't know exactly what changed so disable smart saving
481
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
483
self._header_state = DirState.IN_MEMORY_MODIFIED
485
def _mark_unmodified(self):
486
"""Mark this dirstate as unmodified."""
487
self._header_state = DirState.IN_MEMORY_UNMODIFIED
488
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
489
self._known_hash_changes = set()
491
def add(self, path, file_id, kind, stat, fingerprint):
492
"""Add a path to be tracked.
494
:param path: The path within the dirstate - '' is the root, 'foo' is the
495
path foo within the root, 'foo/bar' is the path bar within foo
497
:param file_id: The file id of the path being added.
498
:param kind: The kind of the path, as a string like 'file',
500
:param stat: The output of os.lstat for the path.
501
:param fingerprint: The sha value of the file's canonical form (i.e.
502
after any read filters have been applied),
503
or the target of a symlink,
504
or the referenced revision id for tree-references,
505
or '' for directories.
508
# find the block its in.
509
# find the location in the block.
510
# check its not there
512
#------- copied from inventory.ensure_normalized_name - keep synced.
513
# --- normalized_filename wants a unicode basename only, so get one.
514
dirname, basename = osutils.split(path)
515
# we dont import normalized_filename directly because we want to be
516
# able to change the implementation at runtime for tests.
517
norm_name, can_access = osutils.normalized_filename(basename)
518
if norm_name != basename:
522
raise errors.InvalidNormalization(path)
523
# you should never have files called . or ..; just add the directory
524
# in the parent, or according to the special treatment for the root
525
if basename == '.' or basename == '..':
526
raise errors.InvalidEntryName(path)
527
# now that we've normalised, we need the correct utf8 path and
528
# dirname and basename elements. This single encode and split should be
529
# faster than three separate encodes.
530
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
531
dirname, basename = osutils.split(utf8path)
532
# uses __class__ for speed; the check is needed for safety
533
if file_id.__class__ is not str:
534
raise AssertionError(
535
"must be a utf8 file_id not %s" % (type(file_id), ))
536
# Make sure the file_id does not exist in this tree
538
file_id_entry = self._get_entry(0, fileid_utf8=file_id, include_deleted=True)
539
if file_id_entry != (None, None):
540
if file_id_entry[1][0][0] == 'a':
541
if file_id_entry[0] != (dirname, basename, file_id):
542
# set the old name's current operation to rename
543
self.update_minimal(file_id_entry[0],
549
rename_from = file_id_entry[0][0:2]
551
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
552
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
553
info = '%s:%s' % (kind, path)
554
raise errors.DuplicateFileId(file_id, info)
555
first_key = (dirname, basename, '')
556
block_index, present = self._find_block_index_from_key(first_key)
558
# check the path is not in the tree
559
block = self._dirblocks[block_index][1]
560
entry_index, _ = self._find_entry_index(first_key, block)
561
while (entry_index < len(block) and
562
block[entry_index][0][0:2] == first_key[0:2]):
563
if block[entry_index][1][0][0] not in 'ar':
564
# this path is in the dirstate in the current tree.
565
raise Exception, "adding already added path!"
568
# The block where we want to put the file is not present. But it
569
# might be because the directory was empty, or not loaded yet. Look
570
# for a parent entry, if not found, raise NotVersionedError
571
parent_dir, parent_base = osutils.split(dirname)
572
parent_block_idx, parent_entry_idx, _, parent_present = \
573
self._get_block_entry_index(parent_dir, parent_base, 0)
574
if not parent_present:
575
raise errors.NotVersionedError(path, str(self))
576
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
577
block = self._dirblocks[block_index][1]
578
entry_key = (dirname, basename, file_id)
581
packed_stat = DirState.NULLSTAT
584
packed_stat = pack_stat(stat)
585
parent_info = self._empty_parent_info()
586
minikind = DirState._kind_to_minikind[kind]
587
if rename_from is not None:
589
old_path_utf8 = '%s/%s' % rename_from
591
old_path_utf8 = rename_from[1]
592
parent_info[0] = ('r', old_path_utf8, 0, False, '')
594
entry_data = entry_key, [
595
(minikind, fingerprint, size, False, packed_stat),
597
elif kind == 'directory':
598
entry_data = entry_key, [
599
(minikind, '', 0, False, packed_stat),
601
elif kind == 'symlink':
602
entry_data = entry_key, [
603
(minikind, fingerprint, size, False, packed_stat),
605
elif kind == 'tree-reference':
606
entry_data = entry_key, [
607
(minikind, fingerprint, 0, False, packed_stat),
610
raise errors.BzrError('unknown kind %r' % kind)
611
entry_index, present = self._find_entry_index(entry_key, block)
613
block.insert(entry_index, entry_data)
615
if block[entry_index][1][0][0] != 'a':
616
raise AssertionError(" %r(%r) already added" % (basename, file_id))
617
block[entry_index][1][0] = entry_data[1][0]
619
if kind == 'directory':
620
# insert a new dirblock
621
self._ensure_block(block_index, entry_index, utf8path)
622
self._mark_modified()
624
self._add_to_id_index(self._id_index, entry_key)
626
def _bisect(self, paths):
627
"""Bisect through the disk structure for specific rows.
629
:param paths: A list of paths to find
630
:return: A dict mapping path => entries for found entries. Missing
631
entries will not be in the map.
632
The list is not sorted, and entries will be populated
633
based on when they were read.
635
self._requires_lock()
636
# We need the file pointer to be right after the initial header block
637
self._read_header_if_needed()
638
# If _dirblock_state was in memory, we should just return info from
639
# there, this function is only meant to handle when we want to read
641
if self._dirblock_state != DirState.NOT_IN_MEMORY:
642
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
644
# The disk representation is generally info + '\0\n\0' at the end. But
645
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
646
# Because it means we can sync on the '\n'
647
state_file = self._state_file
648
file_size = os.fstat(state_file.fileno()).st_size
649
# We end up with 2 extra fields, we should have a trailing '\n' to
650
# ensure that we read the whole record, and we should have a precursur
651
# '' which ensures that we start after the previous '\n'
652
entry_field_count = self._fields_per_entry() + 1
654
low = self._end_of_header
655
high = file_size - 1 # Ignore the final '\0'
656
# Map from (dir, name) => entry
659
# Avoid infinite seeking
660
max_count = 30*len(paths)
662
# pending is a list of places to look.
663
# each entry is a tuple of low, high, dir_names
664
# low -> the first byte offset to read (inclusive)
665
# high -> the last byte offset (inclusive)
666
# dir_names -> The list of (dir, name) pairs that should be found in
667
# the [low, high] range
668
pending = [(low, high, paths)]
670
page_size = self._bisect_page_size
672
fields_to_entry = self._get_fields_to_entry()
675
low, high, cur_files = pending.pop()
677
if not cur_files or low >= high:
682
if count > max_count:
683
raise errors.BzrError('Too many seeks, most likely a bug.')
685
mid = max(low, (low+high-page_size)/2)
688
# limit the read size, so we don't end up reading data that we have
690
read_size = min(page_size, (high-mid)+1)
691
block = state_file.read(read_size)
694
entries = block.split('\n')
697
# We didn't find a '\n', so we cannot have found any records.
698
# So put this range back and try again. But we know we have to
699
# increase the page size, because a single read did not contain
700
# a record break (so records must be larger than page_size)
702
pending.append((low, high, cur_files))
705
# Check the first and last entries, in case they are partial, or if
706
# we don't care about the rest of this page
708
first_fields = entries[0].split('\0')
709
if len(first_fields) < entry_field_count:
710
# We didn't get the complete first entry
711
# so move start, and grab the next, which
712
# should be a full entry
713
start += len(entries[0])+1
714
first_fields = entries[1].split('\0')
717
if len(first_fields) <= 2:
718
# We didn't even get a filename here... what do we do?
719
# Try a large page size and repeat this query
721
pending.append((low, high, cur_files))
724
# Find what entries we are looking for, which occur before and
725
# after this first record.
728
first_path = first_fields[1] + '/' + first_fields[2]
730
first_path = first_fields[2]
731
first_loc = _bisect_path_left(cur_files, first_path)
733
# These exist before the current location
734
pre = cur_files[:first_loc]
735
# These occur after the current location, which may be in the
736
# data we read, or might be after the last entry
737
post = cur_files[first_loc:]
739
if post and len(first_fields) >= entry_field_count:
740
# We have files after the first entry
742
# Parse the last entry
743
last_entry_num = len(entries)-1
744
last_fields = entries[last_entry_num].split('\0')
745
if len(last_fields) < entry_field_count:
746
# The very last hunk was not complete,
747
# read the previous hunk
748
after = mid + len(block) - len(entries[-1])
750
last_fields = entries[last_entry_num].split('\0')
752
after = mid + len(block)
755
last_path = last_fields[1] + '/' + last_fields[2]
757
last_path = last_fields[2]
758
last_loc = _bisect_path_right(post, last_path)
760
middle_files = post[:last_loc]
761
post = post[last_loc:]
764
# We have files that should occur in this block
765
# (>= first, <= last)
766
# Either we will find them here, or we can mark them as
769
if middle_files[0] == first_path:
770
# We might need to go before this location
771
pre.append(first_path)
772
if middle_files[-1] == last_path:
773
post.insert(0, last_path)
775
# Find out what paths we have
776
paths = {first_path:[first_fields]}
777
# last_path might == first_path so we need to be
778
# careful if we should append rather than overwrite
779
if last_entry_num != first_entry_num:
780
paths.setdefault(last_path, []).append(last_fields)
781
for num in xrange(first_entry_num+1, last_entry_num):
782
# TODO: jam 20070223 We are already splitting here, so
783
# shouldn't we just split the whole thing rather
784
# than doing the split again in add_one_record?
785
fields = entries[num].split('\0')
787
path = fields[1] + '/' + fields[2]
790
paths.setdefault(path, []).append(fields)
792
for path in middle_files:
793
for fields in paths.get(path, []):
794
# offset by 1 because of the opening '\0'
795
# consider changing fields_to_entry to avoid the
797
entry = fields_to_entry(fields[1:])
798
found.setdefault(path, []).append(entry)
800
# Now we have split up everything into pre, middle, and post, and
801
# we have handled everything that fell in 'middle'.
802
# We add 'post' first, so that we prefer to seek towards the
803
# beginning, so that we will tend to go as early as we need, and
804
# then only seek forward after that.
806
pending.append((after, high, post))
808
pending.append((low, start-1, pre))
810
# Consider that we may want to return the directory entries in sorted
811
# order. For now, we just return them in whatever order we found them,
812
# and leave it up to the caller if they care if it is ordered or not.
815
def _bisect_dirblocks(self, dir_list):
816
"""Bisect through the disk structure to find entries in given dirs.
818
_bisect_dirblocks is meant to find the contents of directories, which
819
differs from _bisect, which only finds individual entries.
821
:param dir_list: A sorted list of directory names ['', 'dir', 'foo'].
822
:return: A map from dir => entries_for_dir
824
# TODO: jam 20070223 A lot of the bisecting logic could be shared
825
# between this and _bisect. It would require parameterizing the
826
# inner loop with a function, though. We should evaluate the
827
# performance difference.
828
self._requires_lock()
829
# We need the file pointer to be right after the initial header block
830
self._read_header_if_needed()
831
# If _dirblock_state was in memory, we should just return info from
832
# there, this function is only meant to handle when we want to read
834
if self._dirblock_state != DirState.NOT_IN_MEMORY:
835
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
836
# The disk representation is generally info + '\0\n\0' at the end. But
837
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
838
# Because it means we can sync on the '\n'
839
state_file = self._state_file
840
file_size = os.fstat(state_file.fileno()).st_size
841
# We end up with 2 extra fields, we should have a trailing '\n' to
842
# ensure that we read the whole record, and we should have a precursur
843
# '' which ensures that we start after the previous '\n'
844
entry_field_count = self._fields_per_entry() + 1
846
low = self._end_of_header
847
high = file_size - 1 # Ignore the final '\0'
848
# Map from dir => entry
851
# Avoid infinite seeking
852
max_count = 30*len(dir_list)
854
# pending is a list of places to look.
855
# each entry is a tuple of low, high, dir_names
856
# low -> the first byte offset to read (inclusive)
857
# high -> the last byte offset (inclusive)
858
# dirs -> The list of directories that should be found in
859
# the [low, high] range
860
pending = [(low, high, dir_list)]
862
page_size = self._bisect_page_size
864
fields_to_entry = self._get_fields_to_entry()
867
low, high, cur_dirs = pending.pop()
869
if not cur_dirs or low >= high:
874
if count > max_count:
875
raise errors.BzrError('Too many seeks, most likely a bug.')
877
mid = max(low, (low+high-page_size)/2)
880
# limit the read size, so we don't end up reading data that we have
882
read_size = min(page_size, (high-mid)+1)
883
block = state_file.read(read_size)
886
entries = block.split('\n')
889
# We didn't find a '\n', so we cannot have found any records.
890
# So put this range back and try again. But we know we have to
891
# increase the page size, because a single read did not contain
892
# a record break (so records must be larger than page_size)
894
pending.append((low, high, cur_dirs))
897
# Check the first and last entries, in case they are partial, or if
898
# we don't care about the rest of this page
900
first_fields = entries[0].split('\0')
901
if len(first_fields) < entry_field_count:
902
# We didn't get the complete first entry
903
# so move start, and grab the next, which
904
# should be a full entry
905
start += len(entries[0])+1
906
first_fields = entries[1].split('\0')
909
if len(first_fields) <= 1:
910
# We didn't even get a dirname here... what do we do?
911
# Try a large page size and repeat this query
913
pending.append((low, high, cur_dirs))
916
# Find what entries we are looking for, which occur before and
917
# after this first record.
919
first_dir = first_fields[1]
920
first_loc = bisect.bisect_left(cur_dirs, first_dir)
922
# These exist before the current location
923
pre = cur_dirs[:first_loc]
924
# These occur after the current location, which may be in the
925
# data we read, or might be after the last entry
926
post = cur_dirs[first_loc:]
928
if post and len(first_fields) >= entry_field_count:
929
# We have records to look at after the first entry
931
# Parse the last entry
932
last_entry_num = len(entries)-1
933
last_fields = entries[last_entry_num].split('\0')
934
if len(last_fields) < entry_field_count:
935
# The very last hunk was not complete,
936
# read the previous hunk
937
after = mid + len(block) - len(entries[-1])
939
last_fields = entries[last_entry_num].split('\0')
941
after = mid + len(block)
943
last_dir = last_fields[1]
944
last_loc = bisect.bisect_right(post, last_dir)
946
middle_files = post[:last_loc]
947
post = post[last_loc:]
950
# We have files that should occur in this block
951
# (>= first, <= last)
952
# Either we will find them here, or we can mark them as
955
if middle_files[0] == first_dir:
956
# We might need to go before this location
957
pre.append(first_dir)
958
if middle_files[-1] == last_dir:
959
post.insert(0, last_dir)
961
# Find out what paths we have
962
paths = {first_dir:[first_fields]}
963
# last_dir might == first_dir so we need to be
964
# careful if we should append rather than overwrite
965
if last_entry_num != first_entry_num:
966
paths.setdefault(last_dir, []).append(last_fields)
967
for num in xrange(first_entry_num+1, last_entry_num):
968
# TODO: jam 20070223 We are already splitting here, so
969
# shouldn't we just split the whole thing rather
970
# than doing the split again in add_one_record?
971
fields = entries[num].split('\0')
972
paths.setdefault(fields[1], []).append(fields)
974
for cur_dir in middle_files:
975
for fields in paths.get(cur_dir, []):
976
# offset by 1 because of the opening '\0'
977
# consider changing fields_to_entry to avoid the
979
entry = fields_to_entry(fields[1:])
980
found.setdefault(cur_dir, []).append(entry)
982
# Now we have split up everything into pre, middle, and post, and
983
# we have handled everything that fell in 'middle'.
984
# We add 'post' first, so that we prefer to seek towards the
985
# beginning, so that we will tend to go as early as we need, and
986
# then only seek forward after that.
988
pending.append((after, high, post))
990
pending.append((low, start-1, pre))
994
def _bisect_recursive(self, paths):
995
"""Bisect for entries for all paths and their children.
997
This will use bisect to find all records for the supplied paths. It
998
will then continue to bisect for any records which are marked as
999
directories. (and renames?)
1001
:param paths: A sorted list of (dir, name) pairs
1002
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
1003
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
1005
# Map from (dir, name, file_id) => [tree_info]
1008
found_dir_names = set()
1010
# Directories that have been read
1011
processed_dirs = set()
1012
# Get the ball rolling with the first bisect for all entries.
1013
newly_found = self._bisect(paths)
1016
# Directories that need to be read
1017
pending_dirs = set()
1018
paths_to_search = set()
1019
for entry_list in newly_found.itervalues():
1020
for dir_name_id, trees_info in entry_list:
1021
found[dir_name_id] = trees_info
1022
found_dir_names.add(dir_name_id[:2])
1024
for tree_info in trees_info:
1025
minikind = tree_info[0]
1028
# We already processed this one as a directory,
1029
# we don't need to do the extra work again.
1031
subdir, name, file_id = dir_name_id
1032
path = osutils.pathjoin(subdir, name)
1034
if path not in processed_dirs:
1035
pending_dirs.add(path)
1036
elif minikind == 'r':
1037
# Rename, we need to directly search the target
1038
# which is contained in the fingerprint column
1039
dir_name = osutils.split(tree_info[1])
1040
if dir_name[0] in pending_dirs:
1041
# This entry will be found in the dir search
1043
if dir_name not in found_dir_names:
1044
paths_to_search.add(tree_info[1])
1045
# Now we have a list of paths to look for directly, and
1046
# directory blocks that need to be read.
1047
# newly_found is mixing the keys between (dir, name) and path
1048
# entries, but that is okay, because we only really care about the
1050
newly_found = self._bisect(sorted(paths_to_search))
1051
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
1052
processed_dirs.update(pending_dirs)
1055
def _discard_merge_parents(self):
1056
"""Discard any parents trees beyond the first.
1058
Note that if this fails the dirstate is corrupted.
1060
After this function returns the dirstate contains 2 trees, neither of
1063
self._read_header_if_needed()
1064
parents = self.get_parent_ids()
1065
if len(parents) < 1:
1067
# only require all dirblocks if we are doing a full-pass removal.
1068
self._read_dirblocks_if_needed()
1069
dead_patterns = set([('a', 'r'), ('a', 'a'), ('r', 'r'), ('r', 'a')])
1070
def iter_entries_removable():
1071
for block in self._dirblocks:
1072
deleted_positions = []
1073
for pos, entry in enumerate(block[1]):
1075
if (entry[1][0][0], entry[1][1][0]) in dead_patterns:
1076
deleted_positions.append(pos)
1077
if deleted_positions:
1078
if len(deleted_positions) == len(block[1]):
1081
for pos in reversed(deleted_positions):
1083
# if the first parent is a ghost:
1084
if parents[0] in self.get_ghosts():
1085
empty_parent = [DirState.NULL_PARENT_DETAILS]
1086
for entry in iter_entries_removable():
1087
entry[1][1:] = empty_parent
1089
for entry in iter_entries_removable():
1093
self._parents = [parents[0]]
1094
self._mark_modified(header_modified=True)
1096
def _empty_parent_info(self):
1097
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
1100
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
1101
"""Ensure a block for dirname exists.
1103
This function exists to let callers which know that there is a
1104
directory dirname ensure that the block for it exists. This block can
1105
fail to exist because of demand loading, or because a directory had no
1106
children. In either case it is not an error. It is however an error to
1107
call this if there is no parent entry for the directory, and thus the
1108
function requires the coordinates of such an entry to be provided.
1110
The root row is special cased and can be indicated with a parent block
1113
:param parent_block_index: The index of the block in which dirname's row
1115
:param parent_row_index: The index in the parent block where the row
1117
:param dirname: The utf8 dirname to ensure there is a block for.
1118
:return: The index for the block.
1120
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
1121
# This is the signature of the root row, and the
1122
# contents-of-root row is always index 1
1124
# the basename of the directory must be the end of its full name.
1125
if not (parent_block_index == -1 and
1126
parent_block_index == -1 and dirname == ''):
1127
if not dirname.endswith(
1128
self._dirblocks[parent_block_index][1][parent_row_index][0][1]):
1129
raise AssertionError("bad dirname %r" % dirname)
1130
block_index, present = self._find_block_index_from_key((dirname, '', ''))
1132
## In future, when doing partial parsing, this should load and
1133
# populate the entire block.
1134
self._dirblocks.insert(block_index, (dirname, []))
1137
def _entries_to_current_state(self, new_entries):
1138
"""Load new_entries into self.dirblocks.
1140
Process new_entries into the current state object, making them the active
1141
state. The entries are grouped together by directory to form dirblocks.
1143
:param new_entries: A sorted list of entries. This function does not sort
1144
to prevent unneeded overhead when callers have a sorted list already.
1147
if new_entries[0][0][0:2] != ('', ''):
1148
raise AssertionError(
1149
"Missing root row %r" % (new_entries[0][0],))
1150
# The two blocks here are deliberate: the root block and the
1151
# contents-of-root block.
1152
self._dirblocks = [('', []), ('', [])]
1153
current_block = self._dirblocks[0][1]
1154
current_dirname = ''
1156
append_entry = current_block.append
1157
for entry in new_entries:
1158
if entry[0][0] != current_dirname:
1159
# new block - different dirname
1161
current_dirname = entry[0][0]
1162
self._dirblocks.append((current_dirname, current_block))
1163
append_entry = current_block.append
1164
# append the entry to the current block
1166
self._split_root_dirblock_into_contents()
1168
def _split_root_dirblock_into_contents(self):
1169
"""Split the root dirblocks into root and contents-of-root.
1171
After parsing by path, we end up with root entries and contents-of-root
1172
entries in the same block. This loop splits them out again.
1174
# The above loop leaves the "root block" entries mixed with the
1175
# "contents-of-root block". But we don't want an if check on
1176
# all entries, so instead we just fix it up here.
1177
if self._dirblocks[1] != ('', []):
1178
raise ValueError("bad dirblock start %r" % (self._dirblocks[1],))
1180
contents_of_root_block = []
1181
for entry in self._dirblocks[0][1]:
1182
if not entry[0][1]: # This is a root entry
1183
root_block.append(entry)
1185
contents_of_root_block.append(entry)
1186
self._dirblocks[0] = ('', root_block)
1187
self._dirblocks[1] = ('', contents_of_root_block)
1189
def _entries_for_path(self, path):
1190
"""Return a list with all the entries that match path for all ids."""
1191
dirname, basename = os.path.split(path)
1192
key = (dirname, basename, '')
1193
block_index, present = self._find_block_index_from_key(key)
1195
# the block which should contain path is absent.
1198
block = self._dirblocks[block_index][1]
1199
entry_index, _ = self._find_entry_index(key, block)
1200
# we may need to look at multiple entries at this path: walk while the specific_files match.
1201
while (entry_index < len(block) and
1202
block[entry_index][0][0:2] == key[0:2]):
1203
result.append(block[entry_index])
1207
def _entry_to_line(self, entry):
1208
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
1210
:param entry: An entry_tuple as defined in the module docstring.
1212
entire_entry = list(entry[0])
1213
for tree_number, tree_data in enumerate(entry[1]):
1214
# (minikind, fingerprint, size, executable, tree_specific_string)
1215
entire_entry.extend(tree_data)
1216
# 3 for the key, 5 for the fields per tree.
1217
tree_offset = 3 + tree_number * 5
1219
entire_entry[tree_offset + 0] = tree_data[0]
1221
entire_entry[tree_offset + 2] = str(tree_data[2])
1223
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
1224
return '\0'.join(entire_entry)
1226
def _fields_per_entry(self):
1227
"""How many null separated fields should be in each entry row.
1229
Each line now has an extra '\\n' field which is not used
1230
so we just skip over it
1233
3 fields for the key
1234
+ number of fields per tree_data (5) * tree count
1237
tree_count = 1 + self._num_present_parents()
1238
return 3 + 5 * tree_count + 1
1240
def _find_block(self, key, add_if_missing=False):
1241
"""Return the block that key should be present in.
1243
:param key: A dirstate entry key.
1244
:return: The block tuple.
1246
block_index, present = self._find_block_index_from_key(key)
1248
if not add_if_missing:
1249
# check to see if key is versioned itself - we might want to
1250
# add it anyway, because dirs with no entries dont get a
1251
# dirblock at parse time.
1252
# This is an uncommon branch to take: most dirs have children,
1253
# and most code works with versioned paths.
1254
parent_base, parent_name = osutils.split(key[0])
1255
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
1256
# some parent path has not been added - its an error to add
1258
raise errors.NotVersionedError(key[0:2], str(self))
1259
self._dirblocks.insert(block_index, (key[0], []))
1260
return self._dirblocks[block_index]
1262
def _find_block_index_from_key(self, key):
1263
"""Find the dirblock index for a key.
1265
:return: The block index, True if the block for the key is present.
1267
if key[0:2] == ('', ''):
1270
if (self._last_block_index is not None and
1271
self._dirblocks[self._last_block_index][0] == key[0]):
1272
return self._last_block_index, True
1275
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
1276
cache=self._split_path_cache)
1277
# _right returns one-past-where-key is so we have to subtract
1278
# one to use it. we use _right here because there are two
1279
# '' blocks - the root, and the contents of root
1280
# we always have a minimum of 2 in self._dirblocks: root and
1281
# root-contents, and for '', we get 2 back, so this is
1282
# simple and correct:
1283
present = (block_index < len(self._dirblocks) and
1284
self._dirblocks[block_index][0] == key[0])
1285
self._last_block_index = block_index
1286
# Reset the entry index cache to the beginning of the block.
1287
self._last_entry_index = -1
1288
return block_index, present
1290
def _find_entry_index(self, key, block):
1291
"""Find the entry index for a key in a block.
1293
:return: The entry index, True if the entry for the key is present.
1295
len_block = len(block)
1297
if self._last_entry_index is not None:
1299
entry_index = self._last_entry_index + 1
1300
# A hit is when the key is after the last slot, and before or
1301
# equal to the next slot.
1302
if ((entry_index > 0 and block[entry_index - 1][0] < key) and
1303
key <= block[entry_index][0]):
1304
self._last_entry_index = entry_index
1305
present = (block[entry_index][0] == key)
1306
return entry_index, present
1309
entry_index = bisect.bisect_left(block, (key, []))
1310
present = (entry_index < len_block and
1311
block[entry_index][0] == key)
1312
self._last_entry_index = entry_index
1313
return entry_index, present
1316
def from_tree(tree, dir_state_filename, sha1_provider=None):
1317
"""Create a dirstate from a bzr Tree.
1319
:param tree: The tree which should provide parent information and
1321
:param sha1_provider: an object meeting the SHA1Provider interface.
1322
If None, a DefaultSHA1Provider is used.
1323
:return: a DirState object which is currently locked for writing.
1324
(it was locked by DirState.initialize)
1326
result = DirState.initialize(dir_state_filename,
1327
sha1_provider=sha1_provider)
1331
parent_ids = tree.get_parent_ids()
1332
num_parents = len(parent_ids)
1334
for parent_id in parent_ids:
1335
parent_tree = tree.branch.repository.revision_tree(parent_id)
1336
parent_trees.append((parent_id, parent_tree))
1337
parent_tree.lock_read()
1338
result.set_parent_trees(parent_trees, [])
1339
result.set_state_from_inventory(tree.inventory)
1341
for revid, parent_tree in parent_trees:
1342
parent_tree.unlock()
1345
# The caller won't have a chance to unlock this, so make sure we
1351
def _check_delta_is_valid(self, delta):
1352
return list(inventory._check_delta_unique_ids(
1353
inventory._check_delta_unique_old_paths(
1354
inventory._check_delta_unique_new_paths(
1355
inventory._check_delta_ids_match_entry(
1356
inventory._check_delta_ids_are_valid(
1357
inventory._check_delta_new_path_entry_both_or_None(delta)))))))
1359
def update_by_delta(self, delta):
1360
"""Apply an inventory delta to the dirstate for tree 0
1362
This is the workhorse for apply_inventory_delta in dirstate based
1365
:param delta: An inventory delta. See Inventory.apply_delta for
1368
self._read_dirblocks_if_needed()
1369
encode = cache_utf8.encode
1372
# Accumulate parent references (path_utf8, id), to check for parentless
1373
# items or items placed under files/links/tree-references. We get
1374
# references from every item in the delta that is not a deletion and
1375
# is not itself the root.
1377
# Added ids must not be in the dirstate already. This set holds those
1380
# This loop transforms the delta to single atomic operations that can
1381
# be executed and validated.
1382
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1383
for old_path, new_path, file_id, inv_entry in delta:
1384
if (file_id in insertions) or (file_id in removals):
1385
self._raise_invalid(old_path or new_path, file_id,
1387
if old_path is not None:
1388
old_path = old_path.encode('utf-8')
1389
removals[file_id] = old_path
1391
new_ids.add(file_id)
1392
if new_path is not None:
1393
if inv_entry is None:
1394
self._raise_invalid(new_path, file_id,
1395
"new_path with no entry")
1396
new_path = new_path.encode('utf-8')
1397
dirname_utf8, basename = osutils.split(new_path)
1399
parents.add((dirname_utf8, inv_entry.parent_id))
1400
key = (dirname_utf8, basename, file_id)
1401
minikind = DirState._kind_to_minikind[inv_entry.kind]
1403
fingerprint = inv_entry.reference_revision or ''
1406
insertions[file_id] = (key, minikind, inv_entry.executable,
1407
fingerprint, new_path)
1408
# Transform moves into delete+add pairs
1409
if None not in (old_path, new_path):
1410
for child in self._iter_child_entries(0, old_path):
1411
if child[0][2] in insertions or child[0][2] in removals:
1413
child_dirname = child[0][0]
1414
child_basename = child[0][1]
1415
minikind = child[1][0][0]
1416
fingerprint = child[1][0][4]
1417
executable = child[1][0][3]
1418
old_child_path = osutils.pathjoin(child_dirname,
1420
removals[child[0][2]] = old_child_path
1421
child_suffix = child_dirname[len(old_path):]
1422
new_child_dirname = (new_path + child_suffix)
1423
key = (new_child_dirname, child_basename, child[0][2])
1424
new_child_path = osutils.pathjoin(new_child_dirname,
1426
insertions[child[0][2]] = (key, minikind, executable,
1427
fingerprint, new_child_path)
1428
self._check_delta_ids_absent(new_ids, delta, 0)
1430
self._apply_removals(removals.iteritems())
1431
self._apply_insertions(insertions.values())
1433
self._after_delta_check_parents(parents, 0)
1434
except errors.BzrError, e:
1435
self._changes_aborted = True
1436
if 'integrity error' not in str(e):
1438
# _get_entry raises BzrError when a request is inconsistent; we
1439
# want such errors to be shown as InconsistentDelta - and that
1440
# fits the behaviour we trigger.
1441
raise errors.InconsistentDeltaDelta(delta,
1442
"error from _get_entry. %s" % (e,))
1444
def _apply_removals(self, removals):
1445
for file_id, path in sorted(removals, reverse=True,
1446
key=operator.itemgetter(1)):
1447
dirname, basename = osutils.split(path)
1448
block_i, entry_i, d_present, f_present = \
1449
self._get_block_entry_index(dirname, basename, 0)
1451
entry = self._dirblocks[block_i][1][entry_i]
1453
self._raise_invalid(path, file_id,
1454
"Wrong path for old path.")
1455
if not f_present or entry[1][0][0] in 'ar':
1456
self._raise_invalid(path, file_id,
1457
"Wrong path for old path.")
1458
if file_id != entry[0][2]:
1459
self._raise_invalid(path, file_id,
1460
"Attempt to remove path has wrong id - found %r."
1462
self._make_absent(entry)
1463
# See if we have a malformed delta: deleting a directory must not
1464
# leave crud behind. This increases the number of bisects needed
1465
# substantially, but deletion or renames of large numbers of paths
1466
# is rare enough it shouldn't be an issue (famous last words?) RBC
1468
block_i, entry_i, d_present, f_present = \
1469
self._get_block_entry_index(path, '', 0)
1471
# The dir block is still present in the dirstate; this could
1472
# be due to it being in a parent tree, or a corrupt delta.
1473
for child_entry in self._dirblocks[block_i][1]:
1474
if child_entry[1][0][0] not in ('r', 'a'):
1475
self._raise_invalid(path, entry[0][2],
1476
"The file id was deleted but its children were "
1479
def _apply_insertions(self, adds):
1481
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1482
self.update_minimal(key, minikind, executable, fingerprint,
1483
path_utf8=path_utf8)
1484
except errors.NotVersionedError:
1485
self._raise_invalid(path_utf8.decode('utf8'), key[2],
1488
def update_basis_by_delta(self, delta, new_revid):
1489
"""Update the parents of this tree after a commit.
1491
This gives the tree one parent, with revision id new_revid. The
1492
inventory delta is applied to the current basis tree to generate the
1493
inventory for the parent new_revid, and all other parent trees are
1496
Note that an exception during the operation of this method will leave
1497
the dirstate in a corrupt state where it should not be saved.
1499
:param new_revid: The new revision id for the trees parent.
1500
:param delta: An inventory delta (see apply_inventory_delta) describing
1501
the changes from the current left most parent revision to new_revid.
1503
self._read_dirblocks_if_needed()
1504
self._discard_merge_parents()
1505
if self._ghosts != []:
1506
raise NotImplementedError(self.update_basis_by_delta)
1507
if len(self._parents) == 0:
1508
# setup a blank tree, the most simple way.
1509
empty_parent = DirState.NULL_PARENT_DETAILS
1510
for entry in self._iter_entries():
1511
entry[1].append(empty_parent)
1512
self._parents.append(new_revid)
1514
self._parents[0] = new_revid
1516
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1520
# The paths this function accepts are unicode and must be encoded as we
1522
encode = cache_utf8.encode
1523
inv_to_entry = self._inv_entry_to_details
1524
# delta is now (deletes, changes), (adds) in reverse lexographical
1526
# deletes in reverse lexographic order are safe to process in situ.
1527
# renames are not, as a rename from any path could go to a path
1528
# lexographically lower, so we transform renames into delete, add pairs,
1529
# expanding them recursively as needed.
1530
# At the same time, to reduce interface friction we convert the input
1531
# inventory entries to dirstate.
1532
root_only = ('', '')
1533
# Accumulate parent references (path_utf8, id), to check for parentless
1534
# items or items placed under files/links/tree-references. We get
1535
# references from every item in the delta that is not a deletion and
1536
# is not itself the root.
1538
# Added ids must not be in the dirstate already. This set holds those
1541
for old_path, new_path, file_id, inv_entry in delta:
1542
if inv_entry is not None and file_id != inv_entry.file_id:
1543
self._raise_invalid(new_path, file_id,
1544
"mismatched entry file_id %r" % inv_entry)
1545
if new_path is None:
1546
new_path_utf8 = None
1548
if inv_entry is None:
1549
self._raise_invalid(new_path, file_id,
1550
"new_path with no entry")
1551
new_path_utf8 = encode(new_path)
1552
# note the parent for validation
1553
dirname_utf8, basename_utf8 = osutils.split(new_path_utf8)
1555
parents.add((dirname_utf8, inv_entry.parent_id))
1556
if old_path is None:
1557
old_path_utf8 = None
1559
old_path_utf8 = encode(old_path)
1560
if old_path is None:
1561
adds.append((None, new_path_utf8, file_id,
1562
inv_to_entry(inv_entry), True))
1563
new_ids.add(file_id)
1564
elif new_path is None:
1565
deletes.append((old_path_utf8, None, file_id, None, True))
1566
elif (old_path, new_path) == root_only:
1567
# change things in-place
1568
# Note: the case of a parent directory changing its file_id
1569
# tends to break optimizations here, because officially
1570
# the file has actually been moved, it just happens to
1571
# end up at the same path. If we can figure out how to
1572
# handle that case, we can avoid a lot of add+delete
1573
# pairs for objects that stay put.
1574
# elif old_path == new_path:
1575
changes.append((old_path_utf8, new_path_utf8, file_id,
1576
inv_to_entry(inv_entry)))
1579
# Because renames must preserve their children we must have
1580
# processed all relocations and removes before hand. The sort
1581
# order ensures we've examined the child paths, but we also
1582
# have to execute the removals, or the split to an add/delete
1583
# pair will result in the deleted item being reinserted, or
1584
# renamed items being reinserted twice - and possibly at the
1585
# wrong place. Splitting into a delete/add pair also simplifies
1586
# the handling of entries with ('f', ...), ('r' ...) because
1587
# the target of the 'r' is old_path here, and we add that to
1588
# deletes, meaning that the add handler does not need to check
1589
# for 'r' items on every pass.
1590
self._update_basis_apply_deletes(deletes)
1592
# Split into an add/delete pair recursively.
1593
adds.append((old_path_utf8, new_path_utf8, file_id,
1594
inv_to_entry(inv_entry), False))
1595
# Expunge deletes that we've seen so that deleted/renamed
1596
# children of a rename directory are handled correctly.
1597
new_deletes = reversed(list(
1598
self._iter_child_entries(1, old_path_utf8)))
1599
# Remove the current contents of the tree at orig_path, and
1600
# reinsert at the correct new path.
1601
for entry in new_deletes:
1602
child_dirname, child_basename, child_file_id = entry[0]
1604
source_path = child_dirname + '/' + child_basename
1606
source_path = child_basename
1608
target_path = new_path_utf8 + source_path[len(old_path):]
1611
raise AssertionError("cannot rename directory to"
1613
target_path = source_path[len(old_path) + 1:]
1614
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1616
(source_path, target_path, entry[0][2], None, False))
1617
deletes.append((old_path_utf8, new_path, file_id, None, False))
1618
self._check_delta_ids_absent(new_ids, delta, 1)
1620
# Finish expunging deletes/first half of renames.
1621
self._update_basis_apply_deletes(deletes)
1622
# Reinstate second half of renames and new paths.
1623
self._update_basis_apply_adds(adds)
1624
# Apply in-situ changes.
1625
self._update_basis_apply_changes(changes)
1627
self._after_delta_check_parents(parents, 1)
1628
except errors.BzrError, e:
1629
self._changes_aborted = True
1630
if 'integrity error' not in str(e):
1632
# _get_entry raises BzrError when a request is inconsistent; we
1633
# want such errors to be shown as InconsistentDelta - and that
1634
# fits the behaviour we trigger.
1635
raise errors.InconsistentDeltaDelta(delta,
1636
"error from _get_entry. %s" % (e,))
1638
self._mark_modified(header_modified=True)
1639
self._id_index = None
1642
def _check_delta_ids_absent(self, new_ids, delta, tree_index):
1643
"""Check that none of the file_ids in new_ids are present in a tree."""
1646
id_index = self._get_id_index()
1647
for file_id in new_ids:
1648
for key in id_index.get(file_id, ()):
1649
block_i, entry_i, d_present, f_present = \
1650
self._get_block_entry_index(key[0], key[1], tree_index)
1652
# In a different tree
1654
entry = self._dirblocks[block_i][1][entry_i]
1655
if entry[0][2] != file_id:
1656
# Different file_id, so not what we want.
1658
self._raise_invalid(("%s/%s" % key[0:2]).decode('utf8'), file_id,
1659
"This file_id is new in the delta but already present in "
1662
def _raise_invalid(self, path, file_id, reason):
1663
self._changes_aborted = True
1664
raise errors.InconsistentDelta(path, file_id, reason)
1666
def _update_basis_apply_adds(self, adds):
1667
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1669
They may be adds, or renames that have been split into add/delete
1672
:param adds: A sequence of adds. Each add is a tuple:
1673
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1674
is False when the add is the second half of a remove-and-reinsert
1675
pair created to handle renames and deletes.
1677
# Adds are accumulated partly from renames, so can be in any input
1679
# TODO: we may want to sort in dirblocks order. That way each entry
1680
# will end up in the same directory, allowing the _get_entry
1681
# fast-path for looking up 2 items in the same dir work.
1682
adds.sort(key=lambda x: x[1])
1683
# adds is now in lexographic order, which places all parents before
1684
# their children, so we can process it linearly.
1686
st = static_tuple.StaticTuple
1687
for old_path, new_path, file_id, new_details, real_add in adds:
1688
dirname, basename = osutils.split(new_path)
1689
entry_key = st(dirname, basename, file_id)
1690
block_index, present = self._find_block_index_from_key(entry_key)
1692
self._raise_invalid(new_path, file_id,
1693
"Unable to find block for this record."
1694
" Was the parent added?")
1695
block = self._dirblocks[block_index][1]
1696
entry_index, present = self._find_entry_index(entry_key, block)
1698
if old_path is not None:
1699
self._raise_invalid(new_path, file_id,
1700
'considered a real add but still had old_path at %s'
1703
entry = block[entry_index]
1704
basis_kind = entry[1][1][0]
1705
if basis_kind == 'a':
1706
entry[1][1] = new_details
1707
elif basis_kind == 'r':
1708
raise NotImplementedError()
1710
self._raise_invalid(new_path, file_id,
1711
"An entry was marked as a new add"
1712
" but the basis target already existed")
1714
# The exact key was not found in the block. However, we need to
1715
# check if there is a key next to us that would have matched.
1716
# We only need to check 2 locations, because there are only 2
1718
for maybe_index in range(entry_index-1, entry_index+1):
1719
if maybe_index < 0 or maybe_index >= len(block):
1721
maybe_entry = block[maybe_index]
1722
if maybe_entry[0][:2] != (dirname, basename):
1723
# Just a random neighbor
1725
if maybe_entry[0][2] == file_id:
1726
raise AssertionError(
1727
'_find_entry_index didnt find a key match'
1728
' but walking the data did, for %s'
1730
basis_kind = maybe_entry[1][1][0]
1731
if basis_kind not in 'ar':
1732
self._raise_invalid(new_path, file_id,
1733
"we have an add record for path, but the path"
1734
" is already present with another file_id %s"
1735
% (maybe_entry[0][2],))
1737
entry = (entry_key, [DirState.NULL_PARENT_DETAILS,
1739
block.insert(entry_index, entry)
1741
active_kind = entry[1][0][0]
1742
if active_kind == 'a':
1743
# The active record shows up as absent, this could be genuine,
1744
# or it could be present at some other location. We need to
1746
id_index = self._get_id_index()
1747
# The id_index may not be perfectly accurate for tree1, because
1748
# we haven't been keeping it updated. However, it should be
1749
# fine for tree0, and that gives us enough info for what we
1751
keys = id_index.get(file_id, ())
1753
block_i, entry_i, d_present, f_present = \
1754
self._get_block_entry_index(key[0], key[1], 0)
1757
active_entry = self._dirblocks[block_i][1][entry_i]
1758
if (active_entry[0][2] != file_id):
1759
# Some other file is at this path, we don't need to
1762
real_active_kind = active_entry[1][0][0]
1763
if real_active_kind in 'ar':
1764
# We found a record, which was not *this* record,
1765
# which matches the file_id, but is not actually
1766
# present. Something seems *really* wrong.
1767
self._raise_invalid(new_path, file_id,
1768
"We found a tree0 entry that doesnt make sense")
1769
# Now, we've found a tree0 entry which matches the file_id
1770
# but is at a different location. So update them to be
1772
active_dir, active_name = active_entry[0][:2]
1774
active_path = active_dir + '/' + active_name
1776
active_path = active_name
1777
active_entry[1][1] = st('r', new_path, 0, False, '')
1778
entry[1][0] = st('r', active_path, 0, False, '')
1779
elif active_kind == 'r':
1780
raise NotImplementedError()
1782
new_kind = new_details[0]
1784
self._ensure_block(block_index, entry_index, new_path)
1786
def _update_basis_apply_changes(self, changes):
1787
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1789
:param adds: A sequence of changes. Each change is a tuple:
1790
(path_utf8, path_utf8, file_id, (entry_details))
1793
for old_path, new_path, file_id, new_details in changes:
1794
# the entry for this file_id must be in tree 0.
1795
entry = self._get_entry(1, file_id, new_path)
1796
if entry[0] is None or entry[1][1][0] in 'ar':
1797
self._raise_invalid(new_path, file_id,
1798
'changed entry considered not present')
1799
entry[1][1] = new_details
1801
def _update_basis_apply_deletes(self, deletes):
1802
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1804
They may be deletes, or renames that have been split into add/delete
1807
:param deletes: A sequence of deletes. Each delete is a tuple:
1808
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1809
real_delete is True when the desired outcome is an actual deletion
1810
rather than the rename handling logic temporarily deleting a path
1811
during the replacement of a parent.
1813
null = DirState.NULL_PARENT_DETAILS
1814
for old_path, new_path, file_id, _, real_delete in deletes:
1815
if real_delete != (new_path is None):
1816
self._raise_invalid(old_path, file_id, "bad delete delta")
1817
# the entry for this file_id must be in tree 1.
1818
dirname, basename = osutils.split(old_path)
1819
block_index, entry_index, dir_present, file_present = \
1820
self._get_block_entry_index(dirname, basename, 1)
1821
if not file_present:
1822
self._raise_invalid(old_path, file_id,
1823
'basis tree does not contain removed entry')
1824
entry = self._dirblocks[block_index][1][entry_index]
1825
# The state of the entry in the 'active' WT
1826
active_kind = entry[1][0][0]
1827
if entry[0][2] != file_id:
1828
self._raise_invalid(old_path, file_id,
1829
'mismatched file_id in tree 1')
1831
old_kind = entry[1][1][0]
1832
if active_kind in 'ar':
1833
# The active tree doesn't have this file_id.
1834
# The basis tree is changing this record. If this is a
1835
# rename, then we don't want the record here at all
1836
# anymore. If it is just an in-place change, we want the
1837
# record here, but we'll add it if we need to. So we just
1839
if active_kind == 'r':
1840
active_path = entry[1][0][1]
1841
active_entry = self._get_entry(0, file_id, active_path)
1842
if active_entry[1][1][0] != 'r':
1843
self._raise_invalid(old_path, file_id,
1844
"Dirstate did not have matching rename entries")
1845
elif active_entry[1][0][0] in 'ar':
1846
self._raise_invalid(old_path, file_id,
1847
"Dirstate had a rename pointing at an inactive"
1849
active_entry[1][1] = null
1850
del self._dirblocks[block_index][1][entry_index]
1852
# This was a directory, and the active tree says it
1853
# doesn't exist, and now the basis tree says it doesn't
1854
# exist. Remove its dirblock if present
1856
present) = self._find_block_index_from_key(
1859
dir_block = self._dirblocks[dir_block_index][1]
1861
# This entry is empty, go ahead and just remove it
1862
del self._dirblocks[dir_block_index]
1864
# There is still an active record, so just mark this
1867
block_i, entry_i, d_present, f_present = \
1868
self._get_block_entry_index(old_path, '', 1)
1870
dir_block = self._dirblocks[block_i][1]
1871
for child_entry in dir_block:
1872
child_basis_kind = child_entry[1][1][0]
1873
if child_basis_kind not in 'ar':
1874
self._raise_invalid(old_path, file_id,
1875
"The file id was deleted but its children were "
1878
def _after_delta_check_parents(self, parents, index):
1879
"""Check that parents required by the delta are all intact.
1881
:param parents: An iterable of (path_utf8, file_id) tuples which are
1882
required to be present in tree 'index' at path_utf8 with id file_id
1884
:param index: The column in the dirstate to check for parents in.
1886
for dirname_utf8, file_id in parents:
1887
# Get the entry - the ensures that file_id, dirname_utf8 exists and
1888
# has the right file id.
1889
entry = self._get_entry(index, file_id, dirname_utf8)
1890
if entry[1] is None:
1891
self._raise_invalid(dirname_utf8.decode('utf8'),
1892
file_id, "This parent is not present.")
1893
# Parents of things must be directories
1894
if entry[1][index][0] != 'd':
1895
self._raise_invalid(dirname_utf8.decode('utf8'),
1896
file_id, "This parent is not a directory.")
1898
def _observed_sha1(self, entry, sha1, stat_value,
1899
_stat_to_minikind=_stat_to_minikind, _pack_stat=pack_stat):
1900
"""Note the sha1 of a file.
1902
:param entry: The entry the sha1 is for.
1903
:param sha1: The observed sha1.
1904
:param stat_value: The os.lstat for the file.
1907
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
1911
packed_stat = _pack_stat(stat_value)
1913
if self._cutoff_time is None:
1914
self._sha_cutoff_time()
1915
if (stat_value.st_mtime < self._cutoff_time
1916
and stat_value.st_ctime < self._cutoff_time):
1917
entry[1][0] = ('f', sha1, stat_value.st_size, entry[1][0][3],
1919
self._mark_modified([entry])
1921
def _sha_cutoff_time(self):
1922
"""Return cutoff time.
1924
Files modified more recently than this time are at risk of being
1925
undetectably modified and so can't be cached.
1927
# Cache the cutoff time as long as we hold a lock.
1928
# time.time() isn't super expensive (approx 3.38us), but
1929
# when you call it 50,000 times it adds up.
1930
# For comparison, os.lstat() costs 7.2us if it is hot.
1931
self._cutoff_time = int(time.time()) - 3
1932
return self._cutoff_time
1934
def _lstat(self, abspath, entry):
1935
"""Return the os.lstat value for this path."""
1936
return os.lstat(abspath)
1938
def _sha1_file_and_mutter(self, abspath):
1939
# when -Dhashcache is turned on, this is monkey-patched in to log
1941
trace.mutter("dirstate sha1 " + abspath)
1942
return self._sha1_provider.sha1(abspath)
1944
def _is_executable(self, mode, old_executable):
1945
"""Is this file executable?"""
1946
return bool(S_IEXEC & mode)
1948
def _is_executable_win32(self, mode, old_executable):
1949
"""On win32 the executable bit is stored in the dirstate."""
1950
return old_executable
1952
if sys.platform == 'win32':
1953
_is_executable = _is_executable_win32
1955
def _read_link(self, abspath, old_link):
1956
"""Read the target of a symlink"""
1957
# TODO: jam 200700301 On Win32, this could just return the value
1958
# already in memory. However, this really needs to be done at a
1959
# higher level, because there either won't be anything on disk,
1960
# or the thing on disk will be a file.
1961
fs_encoding = osutils._fs_enc
1962
if isinstance(abspath, unicode):
1963
# abspath is defined as the path to pass to lstat. readlink is
1964
# buggy in python < 2.6 (it doesn't encode unicode path into FS
1965
# encoding), so we need to encode ourselves knowing that unicode
1966
# paths are produced by UnicodeDirReader on purpose.
1967
abspath = abspath.encode(fs_encoding)
1968
target = os.readlink(abspath)
1969
if fs_encoding not in ('UTF-8', 'US-ASCII', 'ANSI_X3.4-1968'):
1970
# Change encoding if needed
1971
target = target.decode(fs_encoding).encode('UTF-8')
1974
def get_ghosts(self):
1975
"""Return a list of the parent tree revision ids that are ghosts."""
1976
self._read_header_if_needed()
1979
def get_lines(self):
1980
"""Serialise the entire dirstate to a sequence of lines."""
1981
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1982
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1983
# read what's on disk.
1984
self._state_file.seek(0)
1985
return self._state_file.readlines()
1987
lines.append(self._get_parents_line(self.get_parent_ids()))
1988
lines.append(self._get_ghosts_line(self._ghosts))
1989
lines.extend(self._get_entry_lines())
1990
return self._get_output_lines(lines)
1992
def _get_ghosts_line(self, ghost_ids):
1993
"""Create a line for the state file for ghost information."""
1994
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1996
def _get_parents_line(self, parent_ids):
1997
"""Create a line for the state file for parents information."""
1998
return '\0'.join([str(len(parent_ids))] + parent_ids)
2000
def _get_entry_lines(self):
2001
"""Create lines for entries."""
2002
return map(self._entry_to_line, self._iter_entries())
2004
def _get_fields_to_entry(self):
2005
"""Get a function which converts entry fields into a entry record.
2007
This handles size and executable, as well as parent records.
2009
:return: A function which takes a list of fields, and returns an
2010
appropriate record for storing in memory.
2012
# This is intentionally unrolled for performance
2013
num_present_parents = self._num_present_parents()
2014
if num_present_parents == 0:
2015
def fields_to_entry_0_parents(fields, _int=int):
2016
path_name_file_id_key = (fields[0], fields[1], fields[2])
2017
return (path_name_file_id_key, [
2019
fields[3], # minikind
2020
fields[4], # fingerprint
2021
_int(fields[5]), # size
2022
fields[6] == 'y', # executable
2023
fields[7], # packed_stat or revision_id
2025
return fields_to_entry_0_parents
2026
elif num_present_parents == 1:
2027
def fields_to_entry_1_parent(fields, _int=int):
2028
path_name_file_id_key = (fields[0], fields[1], fields[2])
2029
return (path_name_file_id_key, [
2031
fields[3], # minikind
2032
fields[4], # fingerprint
2033
_int(fields[5]), # size
2034
fields[6] == 'y', # executable
2035
fields[7], # packed_stat or revision_id
2038
fields[8], # minikind
2039
fields[9], # fingerprint
2040
_int(fields[10]), # size
2041
fields[11] == 'y', # executable
2042
fields[12], # packed_stat or revision_id
2045
return fields_to_entry_1_parent
2046
elif num_present_parents == 2:
2047
def fields_to_entry_2_parents(fields, _int=int):
2048
path_name_file_id_key = (fields[0], fields[1], fields[2])
2049
return (path_name_file_id_key, [
2051
fields[3], # minikind
2052
fields[4], # fingerprint
2053
_int(fields[5]), # size
2054
fields[6] == 'y', # executable
2055
fields[7], # packed_stat or revision_id
2058
fields[8], # minikind
2059
fields[9], # fingerprint
2060
_int(fields[10]), # size
2061
fields[11] == 'y', # executable
2062
fields[12], # packed_stat or revision_id
2065
fields[13], # minikind
2066
fields[14], # fingerprint
2067
_int(fields[15]), # size
2068
fields[16] == 'y', # executable
2069
fields[17], # packed_stat or revision_id
2072
return fields_to_entry_2_parents
2074
def fields_to_entry_n_parents(fields, _int=int):
2075
path_name_file_id_key = (fields[0], fields[1], fields[2])
2076
trees = [(fields[cur], # minikind
2077
fields[cur+1], # fingerprint
2078
_int(fields[cur+2]), # size
2079
fields[cur+3] == 'y', # executable
2080
fields[cur+4], # stat or revision_id
2081
) for cur in xrange(3, len(fields)-1, 5)]
2082
return path_name_file_id_key, trees
2083
return fields_to_entry_n_parents
2085
def get_parent_ids(self):
2086
"""Return a list of the parent tree ids for the directory state."""
2087
self._read_header_if_needed()
2088
return list(self._parents)
2090
def _get_block_entry_index(self, dirname, basename, tree_index):
2091
"""Get the coordinates for a path in the state structure.
2093
:param dirname: The utf8 dirname to lookup.
2094
:param basename: The utf8 basename to lookup.
2095
:param tree_index: The index of the tree for which this lookup should
2097
:return: A tuple describing where the path is located, or should be
2098
inserted. The tuple contains four fields: the block index, the row
2099
index, the directory is present (boolean), the entire path is
2100
present (boolean). There is no guarantee that either
2101
coordinate is currently reachable unless the found field for it is
2102
True. For instance, a directory not present in the searched tree
2103
may be returned with a value one greater than the current highest
2104
block offset. The directory present field will always be True when
2105
the path present field is True. The directory present field does
2106
NOT indicate that the directory is present in the searched tree,
2107
rather it indicates that there are at least some files in some
2110
self._read_dirblocks_if_needed()
2111
key = dirname, basename, ''
2112
block_index, present = self._find_block_index_from_key(key)
2114
# no such directory - return the dir index and 0 for the row.
2115
return block_index, 0, False, False
2116
block = self._dirblocks[block_index][1] # access the entries only
2117
entry_index, present = self._find_entry_index(key, block)
2118
# linear search through entries at this path to find the one
2120
while entry_index < len(block) and block[entry_index][0][1] == basename:
2121
if block[entry_index][1][tree_index][0] not in 'ar':
2122
# neither absent or relocated
2123
return block_index, entry_index, True, True
2125
return block_index, entry_index, True, False
2127
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None,
2128
include_deleted=False):
2129
"""Get the dirstate entry for path in tree tree_index.
2131
If either file_id or path is supplied, it is used as the key to lookup.
2132
If both are supplied, the fastest lookup is used, and an error is
2133
raised if they do not both point at the same row.
2135
:param tree_index: The index of the tree we wish to locate this path
2136
in. If the path is present in that tree, the entry containing its
2137
details is returned, otherwise (None, None) is returned
2138
0 is the working tree, higher indexes are successive parent
2140
:param fileid_utf8: A utf8 file_id to look up.
2141
:param path_utf8: An utf8 path to be looked up.
2142
:param include_deleted: If True, and performing a lookup via
2143
fileid_utf8 rather than path_utf8, return an entry for deleted
2145
:return: The dirstate entry tuple for path, or (None, None)
2147
self._read_dirblocks_if_needed()
2148
if path_utf8 is not None:
2149
if type(path_utf8) is not str:
2150
raise errors.BzrError('path_utf8 is not a str: %s %r'
2151
% (type(path_utf8), path_utf8))
2152
# path lookups are faster
2153
dirname, basename = osutils.split(path_utf8)
2154
block_index, entry_index, dir_present, file_present = \
2155
self._get_block_entry_index(dirname, basename, tree_index)
2156
if not file_present:
2158
entry = self._dirblocks[block_index][1][entry_index]
2159
if not (entry[0][2] and entry[1][tree_index][0] not in ('a', 'r')):
2160
raise AssertionError('unversioned entry?')
2162
if entry[0][2] != fileid_utf8:
2163
self._changes_aborted = True
2164
raise errors.BzrError('integrity error ? : mismatching'
2165
' tree_index, file_id and path')
2168
possible_keys = self._get_id_index().get(fileid_utf8, ())
2169
if not possible_keys:
2171
for key in possible_keys:
2172
block_index, present = \
2173
self._find_block_index_from_key(key)
2174
# strange, probably indicates an out of date
2175
# id index - for now, allow this.
2178
# WARNING: DO not change this code to use _get_block_entry_index
2179
# as that function is not suitable: it does not use the key
2180
# to lookup, and thus the wrong coordinates are returned.
2181
block = self._dirblocks[block_index][1]
2182
entry_index, present = self._find_entry_index(key, block)
2184
entry = self._dirblocks[block_index][1][entry_index]
2185
# TODO: We might want to assert that entry[0][2] ==
2187
if entry[1][tree_index][0] in 'fdlt':
2188
# this is the result we are looking for: the
2189
# real home of this file_id in this tree.
2191
if entry[1][tree_index][0] == 'a':
2192
# there is no home for this entry in this tree
2196
if entry[1][tree_index][0] != 'r':
2197
raise AssertionError(
2198
"entry %r has invalid minikind %r for tree %r" \
2200
entry[1][tree_index][0],
2202
real_path = entry[1][tree_index][1]
2203
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
2204
path_utf8=real_path)
2208
def initialize(cls, path, sha1_provider=None):
2209
"""Create a new dirstate on path.
2211
The new dirstate will be an empty tree - that is it has no parents,
2212
and only a root node - which has id ROOT_ID.
2214
:param path: The name of the file for the dirstate.
2215
:param sha1_provider: an object meeting the SHA1Provider interface.
2216
If None, a DefaultSHA1Provider is used.
2217
:return: A write-locked DirState object.
2219
# This constructs a new DirState object on a path, sets the _state_file
2220
# to a new empty file for that path. It then calls _set_data() with our
2221
# stock empty dirstate information - a root with ROOT_ID, no children,
2222
# and no parents. Finally it calls save() to ensure that this data will
2224
if sha1_provider is None:
2225
sha1_provider = DefaultSHA1Provider()
2226
result = cls(path, sha1_provider)
2227
# root dir and root dir contents with no children.
2228
empty_tree_dirblocks = [('', []), ('', [])]
2229
# a new root directory, with a NULLSTAT.
2230
empty_tree_dirblocks[0][1].append(
2231
(('', '', inventory.ROOT_ID), [
2232
('d', '', 0, False, DirState.NULLSTAT),
2236
result._set_data([], empty_tree_dirblocks)
2244
def _inv_entry_to_details(inv_entry):
2245
"""Convert an inventory entry (from a revision tree) to state details.
2247
:param inv_entry: An inventory entry whose sha1 and link targets can be
2248
relied upon, and which has a revision set.
2249
:return: A details tuple - the details for a single tree at a path +
2252
kind = inv_entry.kind
2253
minikind = DirState._kind_to_minikind[kind]
2254
tree_data = inv_entry.revision
2255
if kind == 'directory':
2259
elif kind == 'symlink':
2260
if inv_entry.symlink_target is None:
2263
fingerprint = inv_entry.symlink_target.encode('utf8')
2266
elif kind == 'file':
2267
fingerprint = inv_entry.text_sha1 or ''
2268
size = inv_entry.text_size or 0
2269
executable = inv_entry.executable
2270
elif kind == 'tree-reference':
2271
fingerprint = inv_entry.reference_revision or ''
2275
raise Exception("can't pack %s" % inv_entry)
2276
return static_tuple.StaticTuple(minikind, fingerprint, size,
2277
executable, tree_data)
2279
def _iter_child_entries(self, tree_index, path_utf8):
2280
"""Iterate over all the entries that are children of path_utf.
2282
This only returns entries that are present (not in 'a', 'r') in
2283
tree_index. tree_index data is not refreshed, so if tree 0 is used,
2284
results may differ from that obtained if paths were statted to
2285
determine what ones were directories.
2287
Asking for the children of a non-directory will return an empty
2291
next_pending_dirs = [path_utf8]
2293
while next_pending_dirs:
2294
pending_dirs = next_pending_dirs
2295
next_pending_dirs = []
2296
for path in pending_dirs:
2297
block_index, present = self._find_block_index_from_key(
2299
if block_index == 0:
2301
if len(self._dirblocks) == 1:
2302
# asked for the children of the root with no other
2306
# children of a non-directory asked for.
2308
block = self._dirblocks[block_index]
2309
for entry in block[1]:
2310
kind = entry[1][tree_index][0]
2311
if kind not in absent:
2315
path = entry[0][0] + '/' + entry[0][1]
2318
next_pending_dirs.append(path)
2320
def _iter_entries(self):
2321
"""Iterate over all the entries in the dirstate.
2323
Each yelt item is an entry in the standard format described in the
2324
docstring of bzrlib.dirstate.
2326
self._read_dirblocks_if_needed()
2327
for directory in self._dirblocks:
2328
for entry in directory[1]:
2331
def _get_id_index(self):
2332
"""Get an id index of self._dirblocks.
2334
This maps from file_id => [(directory, name, file_id)] entries where
2335
that file_id appears in one of the trees.
2337
if self._id_index is None:
2339
for key, tree_details in self._iter_entries():
2340
self._add_to_id_index(id_index, key)
2341
self._id_index = id_index
2342
return self._id_index
2344
def _add_to_id_index(self, id_index, entry_key):
2345
"""Add this entry to the _id_index mapping."""
2346
# This code used to use a set for every entry in the id_index. However,
2347
# it is *rare* to have more than one entry. So a set is a large
2348
# overkill. And even when we do, we won't ever have more than the
2349
# number of parent trees. Which is still a small number (rarely >2). As
2350
# such, we use a simple tuple, and do our own uniqueness checks. While
2351
# the 'in' check is O(N) since N is nicely bounded it shouldn't ever
2352
# cause quadratic failure.
2353
file_id = entry_key[2]
2354
entry_key = static_tuple.StaticTuple.from_sequence(entry_key)
2355
if file_id not in id_index:
2356
id_index[file_id] = static_tuple.StaticTuple(entry_key,)
2358
entry_keys = id_index[file_id]
2359
if entry_key not in entry_keys:
2360
id_index[file_id] = entry_keys + (entry_key,)
2362
def _remove_from_id_index(self, id_index, entry_key):
2363
"""Remove this entry from the _id_index mapping.
2365
It is an programming error to call this when the entry_key is not
2368
file_id = entry_key[2]
2369
entry_keys = list(id_index[file_id])
2370
entry_keys.remove(entry_key)
2371
id_index[file_id] = static_tuple.StaticTuple.from_sequence(entry_keys)
2373
def _get_output_lines(self, lines):
2374
"""Format lines for final output.
2376
:param lines: A sequence of lines containing the parents list and the
2379
output_lines = [DirState.HEADER_FORMAT_3]
2380
lines.append('') # a final newline
2381
inventory_text = '\0\n\0'.join(lines)
2382
output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
2383
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
2384
num_entries = len(lines)-3
2385
output_lines.append('num_entries: %s\n' % (num_entries,))
2386
output_lines.append(inventory_text)
2389
def _make_deleted_row(self, fileid_utf8, parents):
2390
"""Return a deleted row for fileid_utf8."""
2391
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
2394
def _num_present_parents(self):
2395
"""The number of parent entries in each record row."""
2396
return len(self._parents) - len(self._ghosts)
2399
def on_file(cls, path, sha1_provider=None, worth_saving_limit=0):
2400
"""Construct a DirState on the file at path "path".
2402
:param path: The path at which the dirstate file on disk should live.
2403
:param sha1_provider: an object meeting the SHA1Provider interface.
2404
If None, a DefaultSHA1Provider is used.
2405
:param worth_saving_limit: when the exact number of hash changed
2406
entries is known, only bother saving the dirstate if more than
2407
this count of entries have changed. -1 means never save.
2408
:return: An unlocked DirState object, associated with the given path.
2410
if sha1_provider is None:
2411
sha1_provider = DefaultSHA1Provider()
2412
result = cls(path, sha1_provider,
2413
worth_saving_limit=worth_saving_limit)
2416
def _read_dirblocks_if_needed(self):
2417
"""Read in all the dirblocks from the file if they are not in memory.
2419
This populates self._dirblocks, and sets self._dirblock_state to
2420
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
2423
self._read_header_if_needed()
2424
if self._dirblock_state == DirState.NOT_IN_MEMORY:
2425
_read_dirblocks(self)
2427
def _read_header(self):
2428
"""This reads in the metadata header, and the parent ids.
2430
After reading in, the file should be positioned at the null
2431
just before the start of the first record in the file.
2433
:return: (expected crc checksum, number of entries, parent list)
2435
self._read_prelude()
2436
parent_line = self._state_file.readline()
2437
info = parent_line.split('\0')
2438
num_parents = int(info[0])
2439
self._parents = info[1:-1]
2440
ghost_line = self._state_file.readline()
2441
info = ghost_line.split('\0')
2442
num_ghosts = int(info[1])
2443
self._ghosts = info[2:-1]
2444
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2445
self._end_of_header = self._state_file.tell()
2447
def _read_header_if_needed(self):
2448
"""Read the header of the dirstate file if needed."""
2449
# inline this as it will be called a lot
2450
if not self._lock_token:
2451
raise errors.ObjectNotLocked(self)
2452
if self._header_state == DirState.NOT_IN_MEMORY:
2455
def _read_prelude(self):
2456
"""Read in the prelude header of the dirstate file.
2458
This only reads in the stuff that is not connected to the crc
2459
checksum. The position will be correct to read in the rest of
2460
the file and check the checksum after this point.
2461
The next entry in the file should be the number of parents,
2462
and their ids. Followed by a newline.
2464
header = self._state_file.readline()
2465
if header != DirState.HEADER_FORMAT_3:
2466
raise errors.BzrError(
2467
'invalid header line: %r' % (header,))
2468
crc_line = self._state_file.readline()
2469
if not crc_line.startswith('crc32: '):
2470
raise errors.BzrError('missing crc32 checksum: %r' % crc_line)
2471
self.crc_expected = int(crc_line[len('crc32: '):-1])
2472
num_entries_line = self._state_file.readline()
2473
if not num_entries_line.startswith('num_entries: '):
2474
raise errors.BzrError('missing num_entries line')
2475
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
2477
def sha1_from_stat(self, path, stat_result, _pack_stat=pack_stat):
2478
"""Find a sha1 given a stat lookup."""
2479
return self._get_packed_stat_index().get(_pack_stat(stat_result), None)
2481
def _get_packed_stat_index(self):
2482
"""Get a packed_stat index of self._dirblocks."""
2483
if self._packed_stat_index is None:
2485
for key, tree_details in self._iter_entries():
2486
if tree_details[0][0] == 'f':
2487
index[tree_details[0][4]] = tree_details[0][1]
2488
self._packed_stat_index = index
2489
return self._packed_stat_index
2492
"""Save any pending changes created during this session.
2494
We reuse the existing file, because that prevents race conditions with
2495
file creation, and use oslocks on it to prevent concurrent modification
2496
and reads - because dirstate's incremental data aggregation is not
2497
compatible with reading a modified file, and replacing a file in use by
2498
another process is impossible on Windows.
2500
A dirstate in read only mode should be smart enough though to validate
2501
that the file has not changed, and otherwise discard its cache and
2502
start over, to allow for fine grained read lock duration, so 'status'
2503
wont block 'commit' - for example.
2505
if self._changes_aborted:
2506
# Should this be a warning? For now, I'm expecting that places that
2507
# mark it inconsistent will warn, making a warning here redundant.
2508
trace.mutter('Not saving DirState because '
2509
'_changes_aborted is set.')
2511
# TODO: Since we now distinguish IN_MEMORY_MODIFIED from
2512
# IN_MEMORY_HASH_MODIFIED, we should only fail quietly if we fail
2513
# to save an IN_MEMORY_HASH_MODIFIED, and fail *noisily* if we
2514
# fail to save IN_MEMORY_MODIFIED
2515
if not self._worth_saving():
2518
grabbed_write_lock = False
2519
if self._lock_state != 'w':
2520
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2521
# Switch over to the new lock, as the old one may be closed.
2522
# TODO: jam 20070315 We should validate the disk file has
2523
# not changed contents, since temporary_write_lock may
2524
# not be an atomic operation.
2525
self._lock_token = new_lock
2526
self._state_file = new_lock.f
2527
if not grabbed_write_lock:
2528
# We couldn't grab a write lock, so we switch back to a read one
2531
lines = self.get_lines()
2532
self._state_file.seek(0)
2533
self._state_file.writelines(lines)
2534
self._state_file.truncate()
2535
self._state_file.flush()
2536
self._maybe_fdatasync()
2537
self._mark_unmodified()
2539
if grabbed_write_lock:
2540
self._lock_token = self._lock_token.restore_read_lock()
2541
self._state_file = self._lock_token.f
2542
# TODO: jam 20070315 We should validate the disk file has
2543
# not changed contents. Since restore_read_lock may
2544
# not be an atomic operation.
2546
def _maybe_fdatasync(self):
2547
"""Flush to disk if possible and if not configured off."""
2548
if self._config_stack.get('dirstate.fdatasync'):
2549
osutils.fdatasync(self._state_file.fileno())
2551
def _worth_saving(self):
2552
"""Is it worth saving the dirstate or not?"""
2553
if (self._header_state == DirState.IN_MEMORY_MODIFIED
2554
or self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2556
if self._dirblock_state == DirState.IN_MEMORY_HASH_MODIFIED:
2557
if self._worth_saving_limit == -1:
2558
# We never save hash changes when the limit is -1
2560
# If we're using smart saving and only a small number of
2561
# entries have changed their hash, don't bother saving. John has
2562
# suggested using a heuristic here based on the size of the
2563
# changed files and/or tree. For now, we go with a configurable
2564
# number of changes, keeping the calculation time
2565
# as low overhead as possible. (This also keeps all existing
2566
# tests passing as the default is 0, i.e. always save.)
2567
if len(self._known_hash_changes) >= self._worth_saving_limit:
2571
def _set_data(self, parent_ids, dirblocks):
2572
"""Set the full dirstate data in memory.
2574
This is an internal function used to completely replace the objects
2575
in memory state. It puts the dirstate into state 'full-dirty'.
2577
:param parent_ids: A list of parent tree revision ids.
2578
:param dirblocks: A list containing one tuple for each directory in the
2579
tree. Each tuple contains the directory path and a list of entries
2580
found in that directory.
2582
# our memory copy is now authoritative.
2583
self._dirblocks = dirblocks
2584
self._mark_modified(header_modified=True)
2585
self._parents = list(parent_ids)
2586
self._id_index = None
2587
self._packed_stat_index = None
2589
def set_path_id(self, path, new_id):
2590
"""Change the id of path to new_id in the current working tree.
2592
:param path: The path inside the tree to set - '' is the root, 'foo'
2593
is the path foo in the root.
2594
:param new_id: The new id to assign to the path. This must be a utf8
2595
file id (not unicode, and not None).
2597
self._read_dirblocks_if_needed()
2599
# TODO: logic not written
2600
raise NotImplementedError(self.set_path_id)
2601
# TODO: check new id is unique
2602
entry = self._get_entry(0, path_utf8=path)
2603
if entry[0][2] == new_id:
2604
# Nothing to change.
2606
# mark the old path absent, and insert a new root path
2607
self._make_absent(entry)
2608
self.update_minimal(('', '', new_id), 'd',
2609
path_utf8='', packed_stat=entry[1][0][4])
2610
self._mark_modified()
2611
# XXX: This was added by Ian, we need to make sure there
2612
# are tests for it, because it isn't in bzr.dev TRUNK
2613
# It looks like the only place it is called is in setting the root
2614
# id of the tree. So probably we never had an _id_index when we
2615
# don't even have a root yet.
2616
if self._id_index is not None:
2617
self._add_to_id_index(self._id_index, entry[0])
2619
def set_parent_trees(self, trees, ghosts):
2620
"""Set the parent trees for the dirstate.
2622
:param trees: A list of revision_id, tree tuples. tree must be provided
2623
even if the revision_id refers to a ghost: supply an empty tree in
2625
:param ghosts: A list of the revision_ids that are ghosts at the time
2628
# TODO: generate a list of parent indexes to preserve to save
2629
# processing specific parent trees. In the common case one tree will
2630
# be preserved - the left most parent.
2631
# TODO: if the parent tree is a dirstate, we might want to walk them
2632
# all by path in parallel for 'optimal' common-case performance.
2633
# generate new root row.
2634
self._read_dirblocks_if_needed()
2635
# TODO future sketch: Examine the existing parents to generate a change
2636
# map and then walk the new parent trees only, mapping them into the
2637
# dirstate. Walk the dirstate at the same time to remove unreferenced
2640
# sketch: loop over all entries in the dirstate, cherry picking
2641
# entries from the parent trees, if they are not ghost trees.
2642
# after we finish walking the dirstate, all entries not in the dirstate
2643
# are deletes, so we want to append them to the end as per the design
2644
# discussions. So do a set difference on ids with the parents to
2645
# get deletes, and add them to the end.
2646
# During the update process we need to answer the following questions:
2647
# - find other keys containing a fileid in order to create cross-path
2648
# links. We dont't trivially use the inventory from other trees
2649
# because this leads to either double touching, or to accessing
2651
# - find other keys containing a path
2652
# We accumulate each entry via this dictionary, including the root
2655
# we could do parallel iterators, but because file id data may be
2656
# scattered throughout, we dont save on index overhead: we have to look
2657
# at everything anyway. We can probably save cycles by reusing parent
2658
# data and doing an incremental update when adding an additional
2659
# parent, but for now the common cases are adding a new parent (merge),
2660
# and replacing completely (commit), and commit is more common: so
2661
# optimise merge later.
2663
# ---- start generation of full tree mapping data
2664
# what trees should we use?
2665
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
2666
# how many trees do we end up with
2667
parent_count = len(parent_trees)
2668
st = static_tuple.StaticTuple
2670
# one: the current tree
2671
for entry in self._iter_entries():
2672
# skip entries not in the current tree
2673
if entry[1][0][0] in 'ar': # absent, relocated
2675
by_path[entry[0]] = [entry[1][0]] + \
2676
[DirState.NULL_PARENT_DETAILS] * parent_count
2677
# TODO: Possibly inline this, since we know it isn't present yet
2678
# id_index[entry[0][2]] = (entry[0],)
2679
self._add_to_id_index(id_index, entry[0])
2681
# now the parent trees:
2682
for tree_index, tree in enumerate(parent_trees):
2683
# the index is off by one, adjust it.
2684
tree_index = tree_index + 1
2685
# when we add new locations for a fileid we need these ranges for
2686
# any fileid in this tree as we set the by_path[id] to:
2687
# already_processed_tree_details + new_details + new_location_suffix
2688
# the suffix is from tree_index+1:parent_count+1.
2689
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
2690
# now stitch in all the entries from this tree
2692
for path, entry in tree.iter_entries_by_dir():
2693
# here we process each trees details for each item in the tree.
2694
# we first update any existing entries for the id at other paths,
2695
# then we either create or update the entry for the id at the
2696
# right path, and finally we add (if needed) a mapping from
2697
# file_id to this path. We do it in this order to allow us to
2698
# avoid checking all known paths for the id when generating a
2699
# new entry at this path: by adding the id->path mapping last,
2700
# all the mappings are valid and have correct relocation
2701
# records where needed.
2702
file_id = entry.file_id
2703
path_utf8 = path.encode('utf8')
2704
dirname, basename = osutils.split(path_utf8)
2705
if dirname == last_dirname:
2706
# Try to re-use objects as much as possible
2707
dirname = last_dirname
2709
last_dirname = dirname
2710
new_entry_key = st(dirname, basename, file_id)
2711
# tree index consistency: All other paths for this id in this tree
2712
# index must point to the correct path.
2713
entry_keys = id_index.get(file_id, ())
2714
for entry_key in entry_keys:
2715
# TODO:PROFILING: It might be faster to just update
2716
# rather than checking if we need to, and then overwrite
2717
# the one we are located at.
2718
if entry_key != new_entry_key:
2719
# this file id is at a different path in one of the
2720
# other trees, so put absent pointers there
2721
# This is the vertical axis in the matrix, all pointing
2723
by_path[entry_key][tree_index] = st('r', path_utf8, 0,
2725
# by path consistency: Insert into an existing path record
2726
# (trivial), or add a new one with relocation pointers for the
2727
# other tree indexes.
2728
if new_entry_key in entry_keys:
2729
# there is already an entry where this data belongs, just
2731
by_path[new_entry_key][tree_index] = \
2732
self._inv_entry_to_details(entry)
2734
# add relocated entries to the horizontal axis - this row
2735
# mapping from path,id. We need to look up the correct path
2736
# for the indexes from 0 to tree_index -1
2738
for lookup_index in xrange(tree_index):
2739
# boundary case: this is the first occurence of file_id
2740
# so there are no id_indexes, possibly take this out of
2742
if not len(entry_keys):
2743
new_details.append(DirState.NULL_PARENT_DETAILS)
2745
# grab any one entry, use it to find the right path.
2746
a_key = iter(entry_keys).next()
2747
if by_path[a_key][lookup_index][0] in ('r', 'a'):
2748
# its a pointer or missing statement, use it as
2750
new_details.append(by_path[a_key][lookup_index])
2752
# we have the right key, make a pointer to it.
2753
real_path = ('/'.join(a_key[0:2])).strip('/')
2754
new_details.append(st('r', real_path, 0, False,
2756
new_details.append(self._inv_entry_to_details(entry))
2757
new_details.extend(new_location_suffix)
2758
by_path[new_entry_key] = new_details
2759
self._add_to_id_index(id_index, new_entry_key)
2760
# --- end generation of full tree mappings
2762
# sort and output all the entries
2763
new_entries = self._sort_entries(by_path.items())
2764
self._entries_to_current_state(new_entries)
2765
self._parents = [rev_id for rev_id, tree in trees]
2766
self._ghosts = list(ghosts)
2767
self._mark_modified(header_modified=True)
2768
self._id_index = id_index
2770
def _sort_entries(self, entry_list):
2771
"""Given a list of entries, sort them into the right order.
2773
This is done when constructing a new dirstate from trees - normally we
2774
try to keep everything in sorted blocks all the time, but sometimes
2775
it's easier to sort after the fact.
2777
# When sorting, we usually have 10x more entries than directories. (69k
2778
# total entries, 4k directories). So cache the results of splitting.
2779
# Saving time and objects. Also, use StaticTuple to avoid putting all
2780
# of these object into python's garbage collector.
2782
def _key(entry, _split_dirs=split_dirs, _st=static_tuple.StaticTuple):
2783
# sort by: directory parts, file name, file id
2784
dirpath, fname, file_id = entry[0]
2786
split = _split_dirs[dirpath]
2788
split = _st.from_sequence(dirpath.split('/'))
2789
_split_dirs[dirpath] = split
2790
return _st(split, fname, file_id)
2791
return sorted(entry_list, key=_key)
2793
def set_state_from_inventory(self, new_inv):
2794
"""Set new_inv as the current state.
2796
This API is called by tree transform, and will usually occur with
2797
existing parent trees.
2799
:param new_inv: The inventory object to set current state from.
2801
if 'evil' in debug.debug_flags:
2802
trace.mutter_callsite(1,
2803
"set_state_from_inventory called; please mutate the tree instead")
2804
tracing = 'dirstate' in debug.debug_flags
2806
trace.mutter("set_state_from_inventory trace:")
2807
self._read_dirblocks_if_needed()
2809
# Two iterators: current data and new data, both in dirblock order.
2810
# We zip them together, which tells about entries that are new in the
2811
# inventory, or removed in the inventory, or present in both and
2814
# You might think we could just synthesize a new dirstate directly
2815
# since we're processing it in the right order. However, we need to
2816
# also consider there may be any number of parent trees and relocation
2817
# pointers, and we don't want to duplicate that here.
2818
new_iterator = new_inv.iter_entries_by_dir()
2819
# we will be modifying the dirstate, so we need a stable iterator. In
2820
# future we might write one, for now we just clone the state into a
2821
# list using a copy so that we see every original item and don't have
2822
# to adjust the position when items are inserted or deleted in the
2823
# underlying dirstate.
2824
old_iterator = iter(list(self._iter_entries()))
2825
# both must have roots so this is safe:
2826
current_new = new_iterator.next()
2827
current_old = old_iterator.next()
2828
def advance(iterator):
2830
return iterator.next()
2831
except StopIteration:
2833
while current_new or current_old:
2834
# skip entries in old that are not really there
2835
if current_old and current_old[1][0][0] in 'ar':
2836
# relocated or absent
2837
current_old = advance(old_iterator)
2840
# convert new into dirblock style
2841
new_path_utf8 = current_new[0].encode('utf8')
2842
new_dirname, new_basename = osutils.split(new_path_utf8)
2843
new_id = current_new[1].file_id
2844
new_entry_key = (new_dirname, new_basename, new_id)
2845
current_new_minikind = \
2846
DirState._kind_to_minikind[current_new[1].kind]
2847
if current_new_minikind == 't':
2848
fingerprint = current_new[1].reference_revision or ''
2850
# We normally only insert or remove records, or update
2851
# them when it has significantly changed. Then we want to
2852
# erase its fingerprint. Unaffected records should
2853
# normally not be updated at all.
2856
# for safety disable variables
2857
new_path_utf8 = new_dirname = new_basename = new_id = \
2858
new_entry_key = None
2859
# 5 cases, we dont have a value that is strictly greater than everything, so
2860
# we make both end conditions explicit
2862
# old is finished: insert current_new into the state.
2864
trace.mutter("Appending from new '%s'.",
2865
new_path_utf8.decode('utf8'))
2866
self.update_minimal(new_entry_key, current_new_minikind,
2867
executable=current_new[1].executable,
2868
path_utf8=new_path_utf8, fingerprint=fingerprint,
2870
current_new = advance(new_iterator)
2871
elif not current_new:
2874
trace.mutter("Truncating from old '%s/%s'.",
2875
current_old[0][0].decode('utf8'),
2876
current_old[0][1].decode('utf8'))
2877
self._make_absent(current_old)
2878
current_old = advance(old_iterator)
2879
elif new_entry_key == current_old[0]:
2880
# same - common case
2881
# We're looking at the same path and id in both the dirstate
2882
# and inventory, so just need to update the fields in the
2883
# dirstate from the one in the inventory.
2884
# TODO: update the record if anything significant has changed.
2885
# the minimal required trigger is if the execute bit or cached
2887
if (current_old[1][0][3] != current_new[1].executable or
2888
current_old[1][0][0] != current_new_minikind):
2890
trace.mutter("Updating in-place change '%s'.",
2891
new_path_utf8.decode('utf8'))
2892
self.update_minimal(current_old[0], current_new_minikind,
2893
executable=current_new[1].executable,
2894
path_utf8=new_path_utf8, fingerprint=fingerprint,
2896
# both sides are dealt with, move on
2897
current_old = advance(old_iterator)
2898
current_new = advance(new_iterator)
2899
elif (cmp_by_dirs(new_dirname, current_old[0][0]) < 0
2900
or (new_dirname == current_old[0][0]
2901
and new_entry_key[1:] < current_old[0][1:])):
2903
# add a entry for this and advance new
2905
trace.mutter("Inserting from new '%s'.",
2906
new_path_utf8.decode('utf8'))
2907
self.update_minimal(new_entry_key, current_new_minikind,
2908
executable=current_new[1].executable,
2909
path_utf8=new_path_utf8, fingerprint=fingerprint,
2911
current_new = advance(new_iterator)
2913
# we've advanced past the place where the old key would be,
2914
# without seeing it in the new list. so it must be gone.
2916
trace.mutter("Deleting from old '%s/%s'.",
2917
current_old[0][0].decode('utf8'),
2918
current_old[0][1].decode('utf8'))
2919
self._make_absent(current_old)
2920
current_old = advance(old_iterator)
2921
self._mark_modified()
2922
self._id_index = None
2923
self._packed_stat_index = None
2925
trace.mutter("set_state_from_inventory complete.")
2927
def set_state_from_scratch(self, working_inv, parent_trees, parent_ghosts):
2928
"""Wipe the currently stored state and set it to something new.
2930
This is a hard-reset for the data we are working with.
2932
# Technically, we really want a write lock, but until we write, we
2933
# don't really need it.
2934
self._requires_lock()
2935
# root dir and root dir contents with no children. We have to have a
2936
# root for set_state_from_inventory to work correctly.
2937
empty_root = (('', '', inventory.ROOT_ID),
2938
[('d', '', 0, False, DirState.NULLSTAT)])
2939
empty_tree_dirblocks = [('', [empty_root]), ('', [])]
2940
self._set_data([], empty_tree_dirblocks)
2941
self.set_state_from_inventory(working_inv)
2942
self.set_parent_trees(parent_trees, parent_ghosts)
2944
def _make_absent(self, current_old):
2945
"""Mark current_old - an entry - as absent for tree 0.
2947
:return: True if this was the last details entry for the entry key:
2948
that is, if the underlying block has had the entry removed, thus
2949
shrinking in length.
2951
# build up paths that this id will be left at after the change is made,
2952
# so we can update their cross references in tree 0
2953
all_remaining_keys = set()
2954
# Dont check the working tree, because it's going.
2955
for details in current_old[1][1:]:
2956
if details[0] not in 'ar': # absent, relocated
2957
all_remaining_keys.add(current_old[0])
2958
elif details[0] == 'r': # relocated
2959
# record the key for the real path.
2960
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2961
# absent rows are not present at any path.
2962
last_reference = current_old[0] not in all_remaining_keys
2964
# the current row consists entire of the current item (being marked
2965
# absent), and relocated or absent entries for the other trees:
2966
# Remove it, its meaningless.
2967
block = self._find_block(current_old[0])
2968
entry_index, present = self._find_entry_index(current_old[0], block[1])
2970
raise AssertionError('could not find entry for %s' % (current_old,))
2971
block[1].pop(entry_index)
2972
# if we have an id_index in use, remove this key from it for this id.
2973
if self._id_index is not None:
2974
self._remove_from_id_index(self._id_index, current_old[0])
2975
# update all remaining keys for this id to record it as absent. The
2976
# existing details may either be the record we are marking as deleted
2977
# (if there were other trees with the id present at this path), or may
2979
for update_key in all_remaining_keys:
2980
update_block_index, present = \
2981
self._find_block_index_from_key(update_key)
2983
raise AssertionError('could not find block for %s' % (update_key,))
2984
update_entry_index, present = \
2985
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2987
raise AssertionError('could not find entry for %s' % (update_key,))
2988
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2989
# it must not be absent at the moment
2990
if update_tree_details[0][0] == 'a': # absent
2991
raise AssertionError('bad row %r' % (update_tree_details,))
2992
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2993
self._mark_modified()
2994
return last_reference
2996
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2997
packed_stat=None, size=0, path_utf8=None, fullscan=False):
2998
"""Update an entry to the state in tree 0.
3000
This will either create a new entry at 'key' or update an existing one.
3001
It also makes sure that any other records which might mention this are
3004
:param key: (dir, name, file_id) for the new entry
3005
:param minikind: The type for the entry ('f' == 'file', 'd' ==
3007
:param executable: Should the executable bit be set?
3008
:param fingerprint: Simple fingerprint for new entry: canonical-form
3009
sha1 for files, referenced revision id for subtrees, etc.
3010
:param packed_stat: Packed stat value for new entry.
3011
:param size: Size information for new entry
3012
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
3014
:param fullscan: If True then a complete scan of the dirstate is being
3015
done and checking for duplicate rows should not be done. This
3016
should only be set by set_state_from_inventory and similar methods.
3018
If packed_stat and fingerprint are not given, they're invalidated in
3021
block = self._find_block(key)[1]
3022
if packed_stat is None:
3023
packed_stat = DirState.NULLSTAT
3024
# XXX: Some callers pass '' as the packed_stat, and it seems to be
3025
# sometimes present in the dirstate - this seems oddly inconsistent.
3027
entry_index, present = self._find_entry_index(key, block)
3028
new_details = (minikind, fingerprint, size, executable, packed_stat)
3029
id_index = self._get_id_index()
3031
# New record. Check there isn't a entry at this path already.
3033
low_index, _ = self._find_entry_index(key[0:2] + ('',), block)
3034
while low_index < len(block):
3035
entry = block[low_index]
3036
if entry[0][0:2] == key[0:2]:
3037
if entry[1][0][0] not in 'ar':
3038
# This entry has the same path (but a different id) as
3039
# the new entry we're adding, and is present in ths
3041
self._raise_invalid(
3042
("%s/%s" % key[0:2]).decode('utf8'), key[2],
3043
"Attempt to add item at path already occupied by "
3044
"id %r" % entry[0][2])
3048
# new entry, synthesis cross reference here,
3049
existing_keys = id_index.get(key[2], ())
3050
if not existing_keys:
3051
# not currently in the state, simplest case
3052
new_entry = key, [new_details] + self._empty_parent_info()
3054
# present at one or more existing other paths.
3055
# grab one of them and use it to generate parent
3056
# relocation/absent entries.
3057
new_entry = key, [new_details]
3058
# existing_keys can be changed as we iterate.
3059
for other_key in tuple(existing_keys):
3060
# change the record at other to be a pointer to this new
3061
# record. The loop looks similar to the change to
3062
# relocations when updating an existing record but its not:
3063
# the test for existing kinds is different: this can be
3064
# factored out to a helper though.
3065
other_block_index, present = self._find_block_index_from_key(
3068
raise AssertionError('could not find block for %s' % (
3070
other_block = self._dirblocks[other_block_index][1]
3071
other_entry_index, present = self._find_entry_index(
3072
other_key, other_block)
3074
raise AssertionError(
3075
'update_minimal: could not find other entry for %s'
3077
if path_utf8 is None:
3078
raise AssertionError('no path')
3079
# Turn this other location into a reference to the new
3080
# location. This also updates the aliased iterator
3081
# (current_old in set_state_from_inventory) so that the old
3082
# entry, if not already examined, is skipped over by that
3084
other_entry = other_block[other_entry_index]
3085
other_entry[1][0] = ('r', path_utf8, 0, False, '')
3086
if self._maybe_remove_row(other_block, other_entry_index,
3088
# If the row holding this was removed, we need to
3089
# recompute where this entry goes
3090
entry_index, _ = self._find_entry_index(key, block)
3093
# adds a tuple to the new details for each column
3094
# - either by copying an existing relocation pointer inside that column
3095
# - or by creating a new pointer to the right row inside that column
3096
num_present_parents = self._num_present_parents()
3097
if num_present_parents:
3098
# TODO: This re-evaluates the existing_keys set, do we need
3099
# to do that ourselves?
3100
other_key = list(existing_keys)[0]
3101
for lookup_index in xrange(1, num_present_parents + 1):
3102
# grab any one entry, use it to find the right path.
3103
# TODO: optimise this to reduce memory use in highly
3104
# fragmented situations by reusing the relocation
3106
update_block_index, present = \
3107
self._find_block_index_from_key(other_key)
3109
raise AssertionError('could not find block for %s' % (other_key,))
3110
update_entry_index, present = \
3111
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
3113
raise AssertionError('update_minimal: could not find entry for %s' % (other_key,))
3114
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
3115
if update_details[0] in 'ar': # relocated, absent
3116
# its a pointer or absent in lookup_index's tree, use
3118
new_entry[1].append(update_details)
3120
# we have the right key, make a pointer to it.
3121
pointer_path = osutils.pathjoin(*other_key[0:2])
3122
new_entry[1].append(('r', pointer_path, 0, False, ''))
3123
block.insert(entry_index, new_entry)
3124
self._add_to_id_index(id_index, key)
3126
# Does the new state matter?
3127
block[entry_index][1][0] = new_details
3128
# parents cannot be affected by what we do.
3129
# other occurences of this id can be found
3130
# from the id index.
3132
# tree index consistency: All other paths for this id in this tree
3133
# index must point to the correct path. We have to loop here because
3134
# we may have passed entries in the state with this file id already
3135
# that were absent - where parent entries are - and they need to be
3136
# converted to relocated.
3137
if path_utf8 is None:
3138
raise AssertionError('no path')
3139
existing_keys = id_index.get(key[2], ())
3140
if key not in existing_keys:
3141
raise AssertionError('We found the entry in the blocks, but'
3142
' the key is not in the id_index.'
3143
' key: %s, existing_keys: %s' % (key, existing_keys))
3144
for entry_key in existing_keys:
3145
# TODO:PROFILING: It might be faster to just update
3146
# rather than checking if we need to, and then overwrite
3147
# the one we are located at.
3148
if entry_key != key:
3149
# this file id is at a different path in one of the
3150
# other trees, so put absent pointers there
3151
# This is the vertical axis in the matrix, all pointing
3153
block_index, present = self._find_block_index_from_key(entry_key)
3155
raise AssertionError('not present: %r', entry_key)
3156
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
3158
raise AssertionError('not present: %r', entry_key)
3159
self._dirblocks[block_index][1][entry_index][1][0] = \
3160
('r', path_utf8, 0, False, '')
3161
# add a containing dirblock if needed.
3162
if new_details[0] == 'd':
3163
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
3164
block_index, present = self._find_block_index_from_key(subdir_key)
3166
self._dirblocks.insert(block_index, (subdir_key[0], []))
3168
self._mark_modified()
3170
def _maybe_remove_row(self, block, index, id_index):
3171
"""Remove index if it is absent or relocated across the row.
3173
id_index is updated accordingly.
3174
:return: True if we removed the row, False otherwise
3176
present_in_row = False
3177
entry = block[index]
3178
for column in entry[1]:
3179
if column[0] not in 'ar':
3180
present_in_row = True
3182
if not present_in_row:
3184
self._remove_from_id_index(id_index, entry[0])
3188
def _validate(self):
3189
"""Check that invariants on the dirblock are correct.
3191
This can be useful in debugging; it shouldn't be necessary in
3194
This must be called with a lock held.
3196
# NOTE: This must always raise AssertionError not just assert,
3197
# otherwise it may not behave properly under python -O
3199
# TODO: All entries must have some content that's not 'a' or 'r',
3200
# otherwise it could just be removed.
3202
# TODO: All relocations must point directly to a real entry.
3204
# TODO: No repeated keys.
3207
from pprint import pformat
3208
self._read_dirblocks_if_needed()
3209
if len(self._dirblocks) > 0:
3210
if not self._dirblocks[0][0] == '':
3211
raise AssertionError(
3212
"dirblocks don't start with root block:\n" + \
3213
pformat(self._dirblocks))
3214
if len(self._dirblocks) > 1:
3215
if not self._dirblocks[1][0] == '':
3216
raise AssertionError(
3217
"dirblocks missing root directory:\n" + \
3218
pformat(self._dirblocks))
3219
# the dirblocks are sorted by their path components, name, and dir id
3220
dir_names = [d[0].split('/')
3221
for d in self._dirblocks[1:]]
3222
if dir_names != sorted(dir_names):
3223
raise AssertionError(
3224
"dir names are not in sorted order:\n" + \
3225
pformat(self._dirblocks) + \
3228
for dirblock in self._dirblocks:
3229
# within each dirblock, the entries are sorted by filename and
3231
for entry in dirblock[1]:
3232
if dirblock[0] != entry[0][0]:
3233
raise AssertionError(
3235
"doesn't match directory name in\n%r" %
3236
(entry, pformat(dirblock)))
3237
if dirblock[1] != sorted(dirblock[1]):
3238
raise AssertionError(
3239
"dirblock for %r is not sorted:\n%s" % \
3240
(dirblock[0], pformat(dirblock)))
3242
def check_valid_parent():
3243
"""Check that the current entry has a valid parent.
3245
This makes sure that the parent has a record,
3246
and that the parent isn't marked as "absent" in the
3247
current tree. (It is invalid to have a non-absent file in an absent
3250
if entry[0][0:2] == ('', ''):
3251
# There should be no parent for the root row
3253
parent_entry = self._get_entry(tree_index, path_utf8=entry[0][0])
3254
if parent_entry == (None, None):
3255
raise AssertionError(
3256
"no parent entry for: %s in tree %s"
3257
% (this_path, tree_index))
3258
if parent_entry[1][tree_index][0] != 'd':
3259
raise AssertionError(
3260
"Parent entry for %s is not marked as a valid"
3261
" directory. %s" % (this_path, parent_entry,))
3263
# For each file id, for each tree: either
3264
# the file id is not present at all; all rows with that id in the
3265
# key have it marked as 'absent'
3266
# OR the file id is present under exactly one name; any other entries
3267
# that mention that id point to the correct name.
3269
# We check this with a dict per tree pointing either to the present
3270
# name, or None if absent.
3271
tree_count = self._num_present_parents() + 1
3272
id_path_maps = [dict() for i in range(tree_count)]
3273
# Make sure that all renamed entries point to the correct location.
3274
for entry in self._iter_entries():
3275
file_id = entry[0][2]
3276
this_path = osutils.pathjoin(entry[0][0], entry[0][1])
3277
if len(entry[1]) != tree_count:
3278
raise AssertionError(
3279
"wrong number of entry details for row\n%s" \
3280
",\nexpected %d" % \
3281
(pformat(entry), tree_count))
3282
absent_positions = 0
3283
for tree_index, tree_state in enumerate(entry[1]):
3284
this_tree_map = id_path_maps[tree_index]
3285
minikind = tree_state[0]
3286
if minikind in 'ar':
3287
absent_positions += 1
3288
# have we seen this id before in this column?
3289
if file_id in this_tree_map:
3290
previous_path, previous_loc = this_tree_map[file_id]
3291
# any later mention of this file must be consistent with
3292
# what was said before
3294
if previous_path is not None:
3295
raise AssertionError(
3296
"file %s is absent in row %r but also present " \
3298
(file_id, entry, previous_path))
3299
elif minikind == 'r':
3300
target_location = tree_state[1]
3301
if previous_path != target_location:
3302
raise AssertionError(
3303
"file %s relocation in row %r but also at %r" \
3304
% (file_id, entry, previous_path))
3306
# a file, directory, etc - may have been previously
3307
# pointed to by a relocation, which must point here
3308
if previous_path != this_path:
3309
raise AssertionError(
3310
"entry %r inconsistent with previous path %r "
3312
(entry, previous_path, previous_loc))
3313
check_valid_parent()
3316
# absent; should not occur anywhere else
3317
this_tree_map[file_id] = None, this_path
3318
elif minikind == 'r':
3319
# relocation, must occur at expected location
3320
this_tree_map[file_id] = tree_state[1], this_path
3322
this_tree_map[file_id] = this_path, this_path
3323
check_valid_parent()
3324
if absent_positions == tree_count:
3325
raise AssertionError(
3326
"entry %r has no data for any tree." % (entry,))
3327
if self._id_index is not None:
3328
for file_id, entry_keys in self._id_index.iteritems():
3329
for entry_key in entry_keys:
3330
if entry_key[2] != file_id:
3331
raise AssertionError(
3332
'file_id %r did not match entry key %s'
3333
% (file_id, entry_key))
3334
if len(entry_keys) != len(set(entry_keys)):
3335
raise AssertionError(
3336
'id_index contained non-unique data for %s'
3339
def _wipe_state(self):
3340
"""Forget all state information about the dirstate."""
3341
self._header_state = DirState.NOT_IN_MEMORY
3342
self._dirblock_state = DirState.NOT_IN_MEMORY
3343
self._changes_aborted = False
3346
self._dirblocks = []
3347
self._id_index = None
3348
self._packed_stat_index = None
3349
self._end_of_header = None
3350
self._cutoff_time = None
3351
self._split_path_cache = {}
3353
def lock_read(self):
3354
"""Acquire a read lock on the dirstate."""
3355
if self._lock_token is not None:
3356
raise errors.LockContention(self._lock_token)
3357
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3358
# already in memory, we could read just the header and check for
3359
# any modification. If not modified, we can just leave things
3361
self._lock_token = lock.ReadLock(self._filename)
3362
self._lock_state = 'r'
3363
self._state_file = self._lock_token.f
3366
def lock_write(self):
3367
"""Acquire a write lock on the dirstate."""
3368
if self._lock_token is not None:
3369
raise errors.LockContention(self._lock_token)
3370
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3371
# already in memory, we could read just the header and check for
3372
# any modification. If not modified, we can just leave things
3374
self._lock_token = lock.WriteLock(self._filename)
3375
self._lock_state = 'w'
3376
self._state_file = self._lock_token.f
3380
"""Drop any locks held on the dirstate."""
3381
if self._lock_token is None:
3382
raise errors.LockNotHeld(self)
3383
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3384
# already in memory, we could read just the header and check for
3385
# any modification. If not modified, we can just leave things
3387
self._state_file = None
3388
self._lock_state = None
3389
self._lock_token.unlock()
3390
self._lock_token = None
3391
self._split_path_cache = {}
3393
def _requires_lock(self):
3394
"""Check that a lock is currently held by someone on the dirstate."""
3395
if not self._lock_token:
3396
raise errors.ObjectNotLocked(self)
3399
def py_update_entry(state, entry, abspath, stat_value,
3400
_stat_to_minikind=DirState._stat_to_minikind,
3401
_pack_stat=pack_stat):
3402
"""Update the entry based on what is actually on disk.
3404
This function only calculates the sha if it needs to - if the entry is
3405
uncachable, or clearly different to the first parent's entry, no sha
3406
is calculated, and None is returned.
3408
:param state: The dirstate this entry is in.
3409
:param entry: This is the dirblock entry for the file in question.
3410
:param abspath: The path on disk for this file.
3411
:param stat_value: The stat value done on the path.
3412
:return: None, or The sha1 hexdigest of the file (40 bytes) or link
3413
target of a symlink.
3416
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
3420
packed_stat = _pack_stat(stat_value)
3421
(saved_minikind, saved_link_or_sha1, saved_file_size,
3422
saved_executable, saved_packed_stat) = entry[1][0]
3424
if minikind == 'd' and saved_minikind == 't':
3426
if (minikind == saved_minikind
3427
and packed_stat == saved_packed_stat):
3428
# The stat hasn't changed since we saved, so we can re-use the
3433
# size should also be in packed_stat
3434
if saved_file_size == stat_value.st_size:
3435
return saved_link_or_sha1
3437
# If we have gotten this far, that means that we need to actually
3438
# process this entry.
3442
executable = state._is_executable(stat_value.st_mode,
3444
if state._cutoff_time is None:
3445
state._sha_cutoff_time()
3446
if (stat_value.st_mtime < state._cutoff_time
3447
and stat_value.st_ctime < state._cutoff_time
3448
and len(entry[1]) > 1
3449
and entry[1][1][0] != 'a'):
3450
# Could check for size changes for further optimised
3451
# avoidance of sha1's. However the most prominent case of
3452
# over-shaing is during initial add, which this catches.
3453
# Besides, if content filtering happens, size and sha
3454
# are calculated at the same time, so checking just the size
3455
# gains nothing w.r.t. performance.
3456
link_or_sha1 = state._sha1_file(abspath)
3457
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
3458
executable, packed_stat)
3460
entry[1][0] = ('f', '', stat_value.st_size,
3461
executable, DirState.NULLSTAT)
3462
worth_saving = False
3463
elif minikind == 'd':
3465
entry[1][0] = ('d', '', 0, False, packed_stat)
3466
if saved_minikind != 'd':
3467
# This changed from something into a directory. Make sure we
3468
# have a directory block for it. This doesn't happen very
3469
# often, so this doesn't have to be super fast.
3470
block_index, entry_index, dir_present, file_present = \
3471
state._get_block_entry_index(entry[0][0], entry[0][1], 0)
3472
state._ensure_block(block_index, entry_index,
3473
osutils.pathjoin(entry[0][0], entry[0][1]))
3475
worth_saving = False
3476
elif minikind == 'l':
3477
if saved_minikind == 'l':
3478
worth_saving = False
3479
link_or_sha1 = state._read_link(abspath, saved_link_or_sha1)
3480
if state._cutoff_time is None:
3481
state._sha_cutoff_time()
3482
if (stat_value.st_mtime < state._cutoff_time
3483
and stat_value.st_ctime < state._cutoff_time):
3484
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
3487
entry[1][0] = ('l', '', stat_value.st_size,
3488
False, DirState.NULLSTAT)
3490
state._mark_modified([entry])
3494
class ProcessEntryPython(object):
3496
__slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id",
3497
"last_source_parent", "last_target_parent", "include_unchanged",
3498
"partial", "use_filesystem_for_exec", "utf8_decode",
3499
"searched_specific_files", "search_specific_files",
3500
"searched_exact_paths", "search_specific_file_parents", "seen_ids",
3501
"state", "source_index", "target_index", "want_unversioned", "tree"]
3503
def __init__(self, include_unchanged, use_filesystem_for_exec,
3504
search_specific_files, state, source_index, target_index,
3505
want_unversioned, tree):
3506
self.old_dirname_to_file_id = {}
3507
self.new_dirname_to_file_id = {}
3508
# Are we doing a partial iter_changes?
3509
self.partial = search_specific_files != set([''])
3510
# Using a list so that we can access the values and change them in
3511
# nested scope. Each one is [path, file_id, entry]
3512
self.last_source_parent = [None, None]
3513
self.last_target_parent = [None, None]
3514
self.include_unchanged = include_unchanged
3515
self.use_filesystem_for_exec = use_filesystem_for_exec
3516
self.utf8_decode = cache_utf8._utf8_decode
3517
# for all search_indexs in each path at or under each element of
3518
# search_specific_files, if the detail is relocated: add the id, and
3519
# add the relocated path as one to search if its not searched already.
3520
# If the detail is not relocated, add the id.
3521
self.searched_specific_files = set()
3522
# When we search exact paths without expanding downwards, we record
3524
self.searched_exact_paths = set()
3525
self.search_specific_files = search_specific_files
3526
# The parents up to the root of the paths we are searching.
3527
# After all normal paths are returned, these specific items are returned.
3528
self.search_specific_file_parents = set()
3529
# The ids we've sent out in the delta.
3530
self.seen_ids = set()
3532
self.source_index = source_index
3533
self.target_index = target_index
3534
if target_index != 0:
3535
# A lot of code in here depends on target_index == 0
3536
raise errors.BzrError('unsupported target index')
3537
self.want_unversioned = want_unversioned
3540
def _process_entry(self, entry, path_info, pathjoin=osutils.pathjoin):
3541
"""Compare an entry and real disk to generate delta information.
3543
:param path_info: top_relpath, basename, kind, lstat, abspath for
3544
the path of entry. If None, then the path is considered absent in
3545
the target (Perhaps we should pass in a concrete entry for this ?)
3546
Basename is returned as a utf8 string because we expect this
3547
tuple will be ignored, and don't want to take the time to
3549
:return: (iter_changes_result, changed). If the entry has not been
3550
handled then changed is None. Otherwise it is False if no content
3551
or metadata changes have occurred, and True if any content or
3552
metadata change has occurred. If self.include_unchanged is True then
3553
if changed is not None, iter_changes_result will always be a result
3554
tuple. Otherwise, iter_changes_result is None unless changed is
3557
if self.source_index is None:
3558
source_details = DirState.NULL_PARENT_DETAILS
3560
source_details = entry[1][self.source_index]
3561
target_details = entry[1][self.target_index]
3562
target_minikind = target_details[0]
3563
if path_info is not None and target_minikind in 'fdlt':
3564
if not (self.target_index == 0):
3565
raise AssertionError()
3566
link_or_sha1 = update_entry(self.state, entry,
3567
abspath=path_info[4], stat_value=path_info[3])
3568
# The entry may have been modified by update_entry
3569
target_details = entry[1][self.target_index]
3570
target_minikind = target_details[0]
3573
file_id = entry[0][2]
3574
source_minikind = source_details[0]
3575
if source_minikind in 'fdltr' and target_minikind in 'fdlt':
3576
# claimed content in both: diff
3577
# r | fdlt | | add source to search, add id path move and perform
3578
# | | | diff check on source-target
3579
# r | fdlt | a | dangling file that was present in the basis.
3581
if source_minikind in 'r':
3582
# add the source to the search path to find any children it
3583
# has. TODO ? : only add if it is a container ?
3584
if not osutils.is_inside_any(self.searched_specific_files,
3586
self.search_specific_files.add(source_details[1])
3587
# generate the old path; this is needed for stating later
3589
old_path = source_details[1]
3590
old_dirname, old_basename = os.path.split(old_path)
3591
path = pathjoin(entry[0][0], entry[0][1])
3592
old_entry = self.state._get_entry(self.source_index,
3594
# update the source details variable to be the real
3596
if old_entry == (None, None):
3597
raise errors.CorruptDirstate(self.state._filename,
3598
"entry '%s/%s' is considered renamed from %r"
3599
" but source does not exist\n"
3600
"entry: %s" % (entry[0][0], entry[0][1], old_path, entry))
3601
source_details = old_entry[1][self.source_index]
3602
source_minikind = source_details[0]
3604
old_dirname = entry[0][0]
3605
old_basename = entry[0][1]
3606
old_path = path = None
3607
if path_info is None:
3608
# the file is missing on disk, show as removed.
3609
content_change = True
3613
# source and target are both versioned and disk file is present.
3614
target_kind = path_info[2]
3615
if target_kind == 'directory':
3617
old_path = path = pathjoin(old_dirname, old_basename)
3618
self.new_dirname_to_file_id[path] = file_id
3619
if source_minikind != 'd':
3620
content_change = True
3622
# directories have no fingerprint
3623
content_change = False
3625
elif target_kind == 'file':
3626
if source_minikind != 'f':
3627
content_change = True
3629
# Check the sha. We can't just rely on the size as
3630
# content filtering may mean differ sizes actually
3631
# map to the same content
3632
if link_or_sha1 is None:
3634
statvalue, link_or_sha1 = \
3635
self.state._sha1_provider.stat_and_sha1(
3637
self.state._observed_sha1(entry, link_or_sha1,
3639
content_change = (link_or_sha1 != source_details[1])
3640
# Target details is updated at update_entry time
3641
if self.use_filesystem_for_exec:
3642
# We don't need S_ISREG here, because we are sure
3643
# we are dealing with a file.
3644
target_exec = bool(stat.S_IEXEC & path_info[3].st_mode)
3646
target_exec = target_details[3]
3647
elif target_kind == 'symlink':
3648
if source_minikind != 'l':
3649
content_change = True
3651
content_change = (link_or_sha1 != source_details[1])
3653
elif target_kind == 'tree-reference':
3654
if source_minikind != 't':
3655
content_change = True
3657
content_change = False
3661
path = pathjoin(old_dirname, old_basename)
3662
raise errors.BadFileKindError(path, path_info[2])
3663
if source_minikind == 'd':
3665
old_path = path = pathjoin(old_dirname, old_basename)
3666
self.old_dirname_to_file_id[old_path] = file_id
3667
# parent id is the entry for the path in the target tree
3668
if old_basename and old_dirname == self.last_source_parent[0]:
3669
source_parent_id = self.last_source_parent[1]
3672
source_parent_id = self.old_dirname_to_file_id[old_dirname]
3674
source_parent_entry = self.state._get_entry(self.source_index,
3675
path_utf8=old_dirname)
3676
source_parent_id = source_parent_entry[0][2]
3677
if source_parent_id == entry[0][2]:
3678
# This is the root, so the parent is None
3679
source_parent_id = None
3681
self.last_source_parent[0] = old_dirname
3682
self.last_source_parent[1] = source_parent_id
3683
new_dirname = entry[0][0]
3684
if entry[0][1] and new_dirname == self.last_target_parent[0]:
3685
target_parent_id = self.last_target_parent[1]
3688
target_parent_id = self.new_dirname_to_file_id[new_dirname]
3690
# TODO: We don't always need to do the lookup, because the
3691
# parent entry will be the same as the source entry.
3692
target_parent_entry = self.state._get_entry(self.target_index,
3693
path_utf8=new_dirname)
3694
if target_parent_entry == (None, None):
3695
raise AssertionError(
3696
"Could not find target parent in wt: %s\nparent of: %s"
3697
% (new_dirname, entry))
3698
target_parent_id = target_parent_entry[0][2]
3699
if target_parent_id == entry[0][2]:
3700
# This is the root, so the parent is None
3701
target_parent_id = None
3703
self.last_target_parent[0] = new_dirname
3704
self.last_target_parent[1] = target_parent_id
3706
source_exec = source_details[3]
3707
changed = (content_change
3708
or source_parent_id != target_parent_id
3709
or old_basename != entry[0][1]
3710
or source_exec != target_exec
3712
if not changed and not self.include_unchanged:
3715
if old_path is None:
3716
old_path = path = pathjoin(old_dirname, old_basename)
3717
old_path_u = self.utf8_decode(old_path)[0]
3720
old_path_u = self.utf8_decode(old_path)[0]
3721
if old_path == path:
3724
path_u = self.utf8_decode(path)[0]
3725
source_kind = DirState._minikind_to_kind[source_minikind]
3726
return (entry[0][2],
3727
(old_path_u, path_u),
3730
(source_parent_id, target_parent_id),
3731
(self.utf8_decode(old_basename)[0], self.utf8_decode(entry[0][1])[0]),
3732
(source_kind, target_kind),
3733
(source_exec, target_exec)), changed
3734
elif source_minikind in 'a' and target_minikind in 'fdlt':
3735
# looks like a new file
3736
path = pathjoin(entry[0][0], entry[0][1])
3737
# parent id is the entry for the path in the target tree
3738
# TODO: these are the same for an entire directory: cache em.
3739
parent_id = self.state._get_entry(self.target_index,
3740
path_utf8=entry[0][0])[0][2]
3741
if parent_id == entry[0][2]:
3743
if path_info is not None:
3745
if self.use_filesystem_for_exec:
3746
# We need S_ISREG here, because we aren't sure if this
3749
stat.S_ISREG(path_info[3].st_mode)
3750
and stat.S_IEXEC & path_info[3].st_mode)
3752
target_exec = target_details[3]
3753
return (entry[0][2],
3754
(None, self.utf8_decode(path)[0]),
3758
(None, self.utf8_decode(entry[0][1])[0]),
3759
(None, path_info[2]),
3760
(None, target_exec)), True
3762
# Its a missing file, report it as such.
3763
return (entry[0][2],
3764
(None, self.utf8_decode(path)[0]),
3768
(None, self.utf8_decode(entry[0][1])[0]),
3770
(None, False)), True
3771
elif source_minikind in 'fdlt' and target_minikind in 'a':
3772
# unversioned, possibly, or possibly not deleted: we dont care.
3773
# if its still on disk, *and* theres no other entry at this
3774
# path [we dont know this in this routine at the moment -
3775
# perhaps we should change this - then it would be an unknown.
3776
old_path = pathjoin(entry[0][0], entry[0][1])
3777
# parent id is the entry for the path in the target tree
3778
parent_id = self.state._get_entry(self.source_index, path_utf8=entry[0][0])[0][2]
3779
if parent_id == entry[0][2]:
3781
return (entry[0][2],
3782
(self.utf8_decode(old_path)[0], None),
3786
(self.utf8_decode(entry[0][1])[0], None),
3787
(DirState._minikind_to_kind[source_minikind], None),
3788
(source_details[3], None)), True
3789
elif source_minikind in 'fdlt' and target_minikind in 'r':
3790
# a rename; could be a true rename, or a rename inherited from
3791
# a renamed parent. TODO: handle this efficiently. Its not
3792
# common case to rename dirs though, so a correct but slow
3793
# implementation will do.
3794
if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
3795
self.search_specific_files.add(target_details[1])
3796
elif source_minikind in 'ra' and target_minikind in 'ra':
3797
# neither of the selected trees contain this file,
3798
# so skip over it. This is not currently directly tested, but
3799
# is indirectly via test_too_much.TestCommands.test_conflicts.
3802
raise AssertionError("don't know how to compare "
3803
"source_minikind=%r, target_minikind=%r"
3804
% (source_minikind, target_minikind))
3810
def _gather_result_for_consistency(self, result):
3811
"""Check a result we will yield to make sure we are consistent later.
3813
This gathers result's parents into a set to output later.
3815
:param result: A result tuple.
3817
if not self.partial or not result[0]:
3819
self.seen_ids.add(result[0])
3820
new_path = result[1][1]
3822
# Not the root and not a delete: queue up the parents of the path.
3823
self.search_specific_file_parents.update(
3824
osutils.parent_directories(new_path.encode('utf8')))
3825
# Add the root directory which parent_directories does not
3827
self.search_specific_file_parents.add('')
3829
def iter_changes(self):
3830
"""Iterate over the changes."""
3831
utf8_decode = cache_utf8._utf8_decode
3832
_cmp_by_dirs = cmp_by_dirs
3833
_process_entry = self._process_entry
3834
search_specific_files = self.search_specific_files
3835
searched_specific_files = self.searched_specific_files
3836
splitpath = osutils.splitpath
3838
# compare source_index and target_index at or under each element of search_specific_files.
3839
# follow the following comparison table. Note that we only want to do diff operations when
3840
# the target is fdl because thats when the walkdirs logic will have exposed the pathinfo
3844
# Source | Target | disk | action
3845
# r | fdlt | | add source to search, add id path move and perform
3846
# | | | diff check on source-target
3847
# r | fdlt | a | dangling file that was present in the basis.
3849
# r | a | | add source to search
3851
# r | r | | this path is present in a non-examined tree, skip.
3852
# r | r | a | this path is present in a non-examined tree, skip.
3853
# a | fdlt | | add new id
3854
# a | fdlt | a | dangling locally added file, skip
3855
# a | a | | not present in either tree, skip
3856
# a | a | a | not present in any tree, skip
3857
# a | r | | not present in either tree at this path, skip as it
3858
# | | | may not be selected by the users list of paths.
3859
# a | r | a | not present in either tree at this path, skip as it
3860
# | | | may not be selected by the users list of paths.
3861
# fdlt | fdlt | | content in both: diff them
3862
# fdlt | fdlt | a | deleted locally, but not unversioned - show as deleted ?
3863
# fdlt | a | | unversioned: output deleted id for now
3864
# fdlt | a | a | unversioned and deleted: output deleted id
3865
# fdlt | r | | relocated in this tree, so add target to search.
3866
# | | | Dont diff, we will see an r,fd; pair when we reach
3867
# | | | this id at the other path.
3868
# fdlt | r | a | relocated in this tree, so add target to search.
3869
# | | | Dont diff, we will see an r,fd; pair when we reach
3870
# | | | this id at the other path.
3872
# TODO: jam 20070516 - Avoid the _get_entry lookup overhead by
3873
# keeping a cache of directories that we have seen.
3875
while search_specific_files:
3876
# TODO: the pending list should be lexically sorted? the
3877
# interface doesn't require it.
3878
current_root = search_specific_files.pop()
3879
current_root_unicode = current_root.decode('utf8')
3880
searched_specific_files.add(current_root)
3881
# process the entries for this containing directory: the rest will be
3882
# found by their parents recursively.
3883
root_entries = self.state._entries_for_path(current_root)
3884
root_abspath = self.tree.abspath(current_root_unicode)
3886
root_stat = os.lstat(root_abspath)
3888
if e.errno == errno.ENOENT:
3889
# the path does not exist: let _process_entry know that.
3890
root_dir_info = None
3892
# some other random error: hand it up.
3895
root_dir_info = ('', current_root,
3896
osutils.file_kind_from_stat_mode(root_stat.st_mode), root_stat,
3898
if root_dir_info[2] == 'directory':
3899
if self.tree._directory_is_tree_reference(
3900
current_root.decode('utf8')):
3901
root_dir_info = root_dir_info[:2] + \
3902
('tree-reference',) + root_dir_info[3:]
3904
if not root_entries and not root_dir_info:
3905
# this specified path is not present at all, skip it.
3907
path_handled = False
3908
for entry in root_entries:
3909
result, changed = _process_entry(entry, root_dir_info)
3910
if changed is not None:
3913
self._gather_result_for_consistency(result)
3914
if changed or self.include_unchanged:
3916
if self.want_unversioned and not path_handled and root_dir_info:
3917
new_executable = bool(
3918
stat.S_ISREG(root_dir_info[3].st_mode)
3919
and stat.S_IEXEC & root_dir_info[3].st_mode)
3921
(None, current_root_unicode),
3925
(None, splitpath(current_root_unicode)[-1]),
3926
(None, root_dir_info[2]),
3927
(None, new_executable)
3929
initial_key = (current_root, '', '')
3930
block_index, _ = self.state._find_block_index_from_key(initial_key)
3931
if block_index == 0:
3932
# we have processed the total root already, but because the
3933
# initial key matched it we should skip it here.
3935
if root_dir_info and root_dir_info[2] == 'tree-reference':
3936
current_dir_info = None
3938
dir_iterator = osutils._walkdirs_utf8(root_abspath, prefix=current_root)
3940
current_dir_info = dir_iterator.next()
3942
# on win32, python2.4 has e.errno == ERROR_DIRECTORY, but
3943
# python 2.5 has e.errno == EINVAL,
3944
# and e.winerror == ERROR_DIRECTORY
3945
e_winerror = getattr(e, 'winerror', None)
3946
win_errors = (ERROR_DIRECTORY, ERROR_PATH_NOT_FOUND)
3947
# there may be directories in the inventory even though
3948
# this path is not a file on disk: so mark it as end of
3950
if e.errno in (errno.ENOENT, errno.ENOTDIR, errno.EINVAL):
3951
current_dir_info = None
3952
elif (sys.platform == 'win32'
3953
and (e.errno in win_errors
3954
or e_winerror in win_errors)):
3955
current_dir_info = None
3959
if current_dir_info[0][0] == '':
3960
# remove .bzr from iteration
3961
bzr_index = bisect.bisect_left(current_dir_info[1], ('.bzr',))
3962
if current_dir_info[1][bzr_index][0] != '.bzr':
3963
raise AssertionError()
3964
del current_dir_info[1][bzr_index]
3965
# walk until both the directory listing and the versioned metadata
3967
if (block_index < len(self.state._dirblocks) and
3968
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3969
current_block = self.state._dirblocks[block_index]
3971
current_block = None
3972
while (current_dir_info is not None or
3973
current_block is not None):
3974
if (current_dir_info and current_block
3975
and current_dir_info[0][0] != current_block[0]):
3976
if _cmp_by_dirs(current_dir_info[0][0], current_block[0]) < 0:
3977
# filesystem data refers to paths not covered by the dirblock.
3978
# this has two possibilities:
3979
# A) it is versioned but empty, so there is no block for it
3980
# B) it is not versioned.
3982
# if (A) then we need to recurse into it to check for
3983
# new unknown files or directories.
3984
# if (B) then we should ignore it, because we don't
3985
# recurse into unknown directories.
3987
while path_index < len(current_dir_info[1]):
3988
current_path_info = current_dir_info[1][path_index]
3989
if self.want_unversioned:
3990
if current_path_info[2] == 'directory':
3991
if self.tree._directory_is_tree_reference(
3992
current_path_info[0].decode('utf8')):
3993
current_path_info = current_path_info[:2] + \
3994
('tree-reference',) + current_path_info[3:]
3995
new_executable = bool(
3996
stat.S_ISREG(current_path_info[3].st_mode)
3997
and stat.S_IEXEC & current_path_info[3].st_mode)
3999
(None, utf8_decode(current_path_info[0])[0]),
4003
(None, utf8_decode(current_path_info[1])[0]),
4004
(None, current_path_info[2]),
4005
(None, new_executable))
4006
# dont descend into this unversioned path if it is
4008
if current_path_info[2] in ('directory',
4010
del current_dir_info[1][path_index]
4014
# This dir info has been handled, go to the next
4016
current_dir_info = dir_iterator.next()
4017
except StopIteration:
4018
current_dir_info = None
4020
# We have a dirblock entry for this location, but there
4021
# is no filesystem path for this. This is most likely
4022
# because a directory was removed from the disk.
4023
# We don't have to report the missing directory,
4024
# because that should have already been handled, but we
4025
# need to handle all of the files that are contained
4027
for current_entry in current_block[1]:
4028
# entry referring to file not present on disk.
4029
# advance the entry only, after processing.
4030
result, changed = _process_entry(current_entry, None)
4031
if changed is not None:
4033
self._gather_result_for_consistency(result)
4034
if changed or self.include_unchanged:
4037
if (block_index < len(self.state._dirblocks) and
4038
osutils.is_inside(current_root,
4039
self.state._dirblocks[block_index][0])):
4040
current_block = self.state._dirblocks[block_index]
4042
current_block = None
4045
if current_block and entry_index < len(current_block[1]):
4046
current_entry = current_block[1][entry_index]
4048
current_entry = None
4049
advance_entry = True
4051
if current_dir_info and path_index < len(current_dir_info[1]):
4052
current_path_info = current_dir_info[1][path_index]
4053
if current_path_info[2] == 'directory':
4054
if self.tree._directory_is_tree_reference(
4055
current_path_info[0].decode('utf8')):
4056
current_path_info = current_path_info[:2] + \
4057
('tree-reference',) + current_path_info[3:]
4059
current_path_info = None
4061
path_handled = False
4062
while (current_entry is not None or
4063
current_path_info is not None):
4064
if current_entry is None:
4065
# the check for path_handled when the path is advanced
4066
# will yield this path if needed.
4068
elif current_path_info is None:
4069
# no path is fine: the per entry code will handle it.
4070
result, changed = _process_entry(current_entry, current_path_info)
4071
if changed is not None:
4073
self._gather_result_for_consistency(result)
4074
if changed or self.include_unchanged:
4076
elif (current_entry[0][1] != current_path_info[1]
4077
or current_entry[1][self.target_index][0] in 'ar'):
4078
# The current path on disk doesn't match the dirblock
4079
# record. Either the dirblock is marked as absent, or
4080
# the file on disk is not present at all in the
4081
# dirblock. Either way, report about the dirblock
4082
# entry, and let other code handle the filesystem one.
4084
# Compare the basename for these files to determine
4086
if current_path_info[1] < current_entry[0][1]:
4087
# extra file on disk: pass for now, but only
4088
# increment the path, not the entry
4089
advance_entry = False
4091
# entry referring to file not present on disk.
4092
# advance the entry only, after processing.
4093
result, changed = _process_entry(current_entry, None)
4094
if changed is not None:
4096
self._gather_result_for_consistency(result)
4097
if changed or self.include_unchanged:
4099
advance_path = False
4101
result, changed = _process_entry(current_entry, current_path_info)
4102
if changed is not None:
4105
self._gather_result_for_consistency(result)
4106
if changed or self.include_unchanged:
4108
if advance_entry and current_entry is not None:
4110
if entry_index < len(current_block[1]):
4111
current_entry = current_block[1][entry_index]
4113
current_entry = None
4115
advance_entry = True # reset the advance flaga
4116
if advance_path and current_path_info is not None:
4117
if not path_handled:
4118
# unversioned in all regards
4119
if self.want_unversioned:
4120
new_executable = bool(
4121
stat.S_ISREG(current_path_info[3].st_mode)
4122
and stat.S_IEXEC & current_path_info[3].st_mode)
4124
relpath_unicode = utf8_decode(current_path_info[0])[0]
4125
except UnicodeDecodeError:
4126
raise errors.BadFilenameEncoding(
4127
current_path_info[0], osutils._fs_enc)
4129
(None, relpath_unicode),
4133
(None, utf8_decode(current_path_info[1])[0]),
4134
(None, current_path_info[2]),
4135
(None, new_executable))
4136
# dont descend into this unversioned path if it is
4138
if current_path_info[2] in ('directory'):
4139
del current_dir_info[1][path_index]
4141
# dont descend the disk iterator into any tree
4143
if current_path_info[2] == 'tree-reference':
4144
del current_dir_info[1][path_index]
4147
if path_index < len(current_dir_info[1]):
4148
current_path_info = current_dir_info[1][path_index]
4149
if current_path_info[2] == 'directory':
4150
if self.tree._directory_is_tree_reference(
4151
current_path_info[0].decode('utf8')):
4152
current_path_info = current_path_info[:2] + \
4153
('tree-reference',) + current_path_info[3:]
4155
current_path_info = None
4156
path_handled = False
4158
advance_path = True # reset the advance flagg.
4159
if current_block is not None:
4161
if (block_index < len(self.state._dirblocks) and
4162
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
4163
current_block = self.state._dirblocks[block_index]
4165
current_block = None
4166
if current_dir_info is not None:
4168
current_dir_info = dir_iterator.next()
4169
except StopIteration:
4170
current_dir_info = None
4171
for result in self._iter_specific_file_parents():
4174
def _iter_specific_file_parents(self):
4175
"""Iter over the specific file parents."""
4176
while self.search_specific_file_parents:
4177
# Process the parent directories for the paths we were iterating.
4178
# Even in extremely large trees this should be modest, so currently
4179
# no attempt is made to optimise.
4180
path_utf8 = self.search_specific_file_parents.pop()
4181
if osutils.is_inside_any(self.searched_specific_files, path_utf8):
4182
# We've examined this path.
4184
if path_utf8 in self.searched_exact_paths:
4185
# We've examined this path.
4187
path_entries = self.state._entries_for_path(path_utf8)
4188
# We need either one or two entries. If the path in
4189
# self.target_index has moved (so the entry in source_index is in
4190
# 'ar') then we need to also look for the entry for this path in
4191
# self.source_index, to output the appropriate delete-or-rename.
4192
selected_entries = []
4194
for candidate_entry in path_entries:
4195
# Find entries present in target at this path:
4196
if candidate_entry[1][self.target_index][0] not in 'ar':
4198
selected_entries.append(candidate_entry)
4199
# Find entries present in source at this path:
4200
elif (self.source_index is not None and
4201
candidate_entry[1][self.source_index][0] not in 'ar'):
4203
if candidate_entry[1][self.target_index][0] == 'a':
4204
# Deleted, emit it here.
4205
selected_entries.append(candidate_entry)
4207
# renamed, emit it when we process the directory it
4209
self.search_specific_file_parents.add(
4210
candidate_entry[1][self.target_index][1])
4212
raise AssertionError(
4213
"Missing entry for specific path parent %r, %r" % (
4214
path_utf8, path_entries))
4215
path_info = self._path_info(path_utf8, path_utf8.decode('utf8'))
4216
for entry in selected_entries:
4217
if entry[0][2] in self.seen_ids:
4219
result, changed = self._process_entry(entry, path_info)
4221
raise AssertionError(
4222
"Got entry<->path mismatch for specific path "
4223
"%r entry %r path_info %r " % (
4224
path_utf8, entry, path_info))
4225
# Only include changes - we're outside the users requested
4228
self._gather_result_for_consistency(result)
4229
if (result[6][0] == 'directory' and
4230
result[6][1] != 'directory'):
4231
# This stopped being a directory, the old children have
4233
if entry[1][self.source_index][0] == 'r':
4234
# renamed, take the source path
4235
entry_path_utf8 = entry[1][self.source_index][1]
4237
entry_path_utf8 = path_utf8
4238
initial_key = (entry_path_utf8, '', '')
4239
block_index, _ = self.state._find_block_index_from_key(
4241
if block_index == 0:
4242
# The children of the root are in block index 1.
4244
current_block = None
4245
if block_index < len(self.state._dirblocks):
4246
current_block = self.state._dirblocks[block_index]
4247
if not osutils.is_inside(
4248
entry_path_utf8, current_block[0]):
4249
# No entries for this directory at all.
4250
current_block = None
4251
if current_block is not None:
4252
for entry in current_block[1]:
4253
if entry[1][self.source_index][0] in 'ar':
4254
# Not in the source tree, so doesn't have to be
4257
# Path of the entry itself.
4259
self.search_specific_file_parents.add(
4260
osutils.pathjoin(*entry[0][:2]))
4261
if changed or self.include_unchanged:
4263
self.searched_exact_paths.add(path_utf8)
4265
def _path_info(self, utf8_path, unicode_path):
4266
"""Generate path_info for unicode_path.
4268
:return: None if unicode_path does not exist, or a path_info tuple.
4270
abspath = self.tree.abspath(unicode_path)
4272
stat = os.lstat(abspath)
4274
if e.errno == errno.ENOENT:
4275
# the path does not exist.
4279
utf8_basename = utf8_path.rsplit('/', 1)[-1]
4280
dir_info = (utf8_path, utf8_basename,
4281
osutils.file_kind_from_stat_mode(stat.st_mode), stat,
4283
if dir_info[2] == 'directory':
4284
if self.tree._directory_is_tree_reference(
4286
self.root_dir_info = self.root_dir_info[:2] + \
4287
('tree-reference',) + self.root_dir_info[3:]
4291
# Try to load the compiled form if possible
4293
from bzrlib._dirstate_helpers_pyx import (
4299
ProcessEntryC as _process_entry,
4300
update_entry as update_entry,
4302
except ImportError, e:
4303
osutils.failed_to_load_extension(e)
4304
from bzrlib._dirstate_helpers_py import (
4311
# FIXME: It would be nice to be able to track moved lines so that the
4312
# corresponding python code can be moved to the _dirstate_helpers_py
4313
# module. I don't want to break the history for this important piece of
4314
# code so I left the code here -- vila 20090622
4315
update_entry = py_update_entry
4316
_process_entry = ProcessEntryPython