1
# Copyright (C) 2006-2011 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
25
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
28
WHOLE_NUMBER = {digit}, digit;
30
REVISION_ID = a non-empty utf8 string;
32
dirstate format = header line, full checksum, row count, parent details,
33
ghost_details, entries;
34
header line = "#bazaar dirstate flat format 3", NL;
35
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
36
row count = "num_entries: ", WHOLE_NUMBER, NL;
37
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
38
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
40
entry = entry_key, current_entry_details, {parent_entry_details};
41
entry_key = dirname, basename, fileid;
42
current_entry_details = common_entry_details, working_entry_details;
43
parent_entry_details = common_entry_details, history_entry_details;
44
common_entry_details = MINIKIND, fingerprint, size, executable
45
working_entry_details = packed_stat
46
history_entry_details = REVISION_ID;
49
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
51
Given this definition, the following is useful to know::
53
entry (aka row) - all the data for a given key.
54
entry[0]: The key (dirname, basename, fileid)
58
entry[1]: The tree(s) data for this path and id combination.
59
entry[1][0]: The current tree
60
entry[1][1]: The second tree
62
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate::
64
entry[1][0][0]: minikind
65
entry[1][0][1]: fingerprint
67
entry[1][0][3]: executable
68
entry[1][0][4]: packed_stat
72
entry[1][1][4]: revision_id
74
There may be multiple rows at the root, one per id present in the root, so the
75
in memory root row is now::
77
self._dirblocks[0] -> ('', [entry ...]),
79
and the entries in there are::
83
entries[0][2]: file_id
84
entries[1][0]: The tree data for the current tree for this fileid at /
89
'r' is a relocated entry: This path is not present in this tree with this
90
id, but the id can be found at another location. The fingerprint is
91
used to point to the target location.
92
'a' is an absent entry: In that tree the id is not present at this path.
93
'd' is a directory entry: This path in this tree is a directory with the
94
current file id. There is no fingerprint for directories.
95
'f' is a file entry: As for directory, but it's a file. The fingerprint is
96
the sha1 value of the file's canonical form, i.e. after any read
97
filters have been applied to the convenience form stored in the working
99
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is
101
't' is a reference to a nested subtree; the fingerprint is the referenced
106
The entries on disk and in memory are ordered according to the following keys::
108
directory, as a list of components
112
--- Format 1 had the following different definition: ---
116
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
117
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
119
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
120
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
123
PARENT ROW's are emitted for every parent that is not in the ghosts details
124
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
125
each row will have a PARENT ROW for foo and baz, but not for bar.
128
In any tree, a kind of 'moved' indicates that the fingerprint field
129
(which we treat as opaque data specific to the 'kind' anyway) has the
130
details for the id of this row in that tree.
132
I'm strongly tempted to add a id->path index as well, but I think that
133
where we need id->path mapping; we also usually read the whole file, so
134
I'm going to skip that for the moment, as we have the ability to locate
135
via bisect any path in any tree, and if we lookup things by path, we can
136
accumulate an id->path mapping as we go, which will tend to match what we
139
I plan to implement this asap, so please speak up now to alter/tweak the
140
design - and once we stabilise on this, I'll update the wiki page for
143
The rationale for all this is that we want fast operations for the
144
common case (diff/status/commit/merge on all files) and extremely fast
145
operations for the less common but still occurs a lot status/diff/commit
146
on specific files). Operations on specific files involve a scan for all
147
the children of a path, *in every involved tree*, which the current
148
format did not accommodate.
152
1. Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
153
2. fall back current object model as needed.
154
3. scale usably to the largest trees known today - say 50K entries. (mozilla
155
is an example of this)
160
Eventually reuse dirstate objects across locks IFF the dirstate file has not
161
been modified, but will require that we flush/ignore cached stat-hit data
162
because we won't want to restat all files on disk just because a lock was
163
acquired, yet we cannot trust the data after the previous lock was released.
165
Memory representation::
167
vector of all directories, and vector of the childen ?
169
root_entrie = (direntry for root, [parent_direntries_for_root]),
171
('', ['data for achild', 'data for bchild', 'data for cchild'])
172
('dir', ['achild', 'cchild', 'echild'])
174
- single bisect to find N subtrees from a path spec
175
- in-order for serialisation - this is 'dirblock' grouping.
176
- insertion of a file '/a' affects only the '/' child-vector, that is, to
177
insert 10K elements from scratch does not generates O(N^2) memoves of a
178
single vector, rather each individual, which tends to be limited to a
179
manageable number. Will scale badly on trees with 10K entries in a
180
single directory. compare with Inventory.InventoryDirectory which has
181
a dictionary for the children. No bisect capability, can only probe for
182
exact matches, or grab all elements and sort.
183
- What's the risk of error here? Once we have the base format being processed
184
we should have a net win regardless of optimality. So we are going to
185
go with what seems reasonable.
189
Maybe we should do a test profile of the core structure - 10K simulated
190
searches/lookups/etc?
192
Objects for each row?
193
The lifetime of Dirstate objects is current per lock, but see above for
194
possible extensions. The lifetime of a row from a dirstate is expected to be
195
very short in the optimistic case: which we are optimising for. For instance,
196
subtree status will determine from analysis of the disk data what rows need to
197
be examined at all, and will be able to determine from a single row whether
198
that file has altered or not, so we are aiming to process tens of thousands of
199
entries each second within the dirstate context, before exposing anything to
200
the larger codebase. This suggests we want the time for a single file
201
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
202
processed, and to scale to 100 thousand we'll another order of magnitude to do
203
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
204
the file on disk, and then immediately discard, the overhead of object creation
205
becomes a significant cost.
207
Figures: Creating a tuple from 3 elements was profiled at 0.0625
208
microseconds, whereas creating a object which is subclassed from tuple was
209
0.500 microseconds, and creating an object with 3 elements and slots was 3
210
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
211
down to 10 microseconds for the total processing - having 33% of that be object
212
creation is a huge overhead. There is a potential cost in using tuples within
213
each row which is that the conditional code to do comparisons may be slower
214
than method invocation, but method invocation is known to be slow due to stack
215
frame creation, so avoiding methods in these tight inner loops in unfortunately
216
desirable. We can consider a pyrex version of this with objects in future if
226
from stat import S_IEXEC
245
# This is the Windows equivalent of ENOTDIR
246
# It is defined in pywin32.winerror, but we don't want a strong dependency for
247
# just an error code.
248
ERROR_PATH_NOT_FOUND = 3
249
ERROR_DIRECTORY = 267
252
if not getattr(struct, '_compile', None):
253
# Cannot pre-compile the dirstate pack_stat
254
def pack_stat(st, _encode=binascii.b2a_base64, _pack=struct.pack):
255
"""Convert stat values into a packed representation."""
256
return _encode(_pack('>LLLLLL', st.st_size, int(st.st_mtime),
257
int(st.st_ctime), st.st_dev, st.st_ino & 0xFFFFFFFF,
260
# compile the struct compiler we need, so as to only do it once
261
from _struct import Struct
262
_compiled_pack = Struct('>LLLLLL').pack
263
def pack_stat(st, _encode=binascii.b2a_base64, _pack=_compiled_pack):
264
"""Convert stat values into a packed representation."""
265
# jam 20060614 it isn't really worth removing more entries if we
266
# are going to leave it in packed form.
267
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
268
# With all entries, filesize is 5.9M and read time is maybe 280ms
269
# well within the noise margin
271
# base64 encoding always adds a final newline, so strip it off
272
# The current version
273
return _encode(_pack(st.st_size, int(st.st_mtime), int(st.st_ctime),
274
st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]
275
# This is 0.060s / 1.520s faster by not encoding as much information
276
# return _encode(_pack('>LL', int(st.st_mtime), st.st_mode))[:-1]
277
# This is not strictly faster than _encode(_pack())[:-1]
278
# return '%X.%X.%X.%X.%X.%X' % (
279
# st.st_size, int(st.st_mtime), int(st.st_ctime),
280
# st.st_dev, st.st_ino, st.st_mode)
281
# Similar to the _encode(_pack('>LL'))
282
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
285
def _unpack_stat(packed_stat):
286
"""Turn a packed_stat back into the stat fields.
288
This is meant as a debugging tool, should not be used in real code.
290
(st_size, st_mtime, st_ctime, st_dev, st_ino,
291
st_mode) = struct.unpack('>LLLLLL', binascii.a2b_base64(packed_stat))
292
return dict(st_size=st_size, st_mtime=st_mtime, st_ctime=st_ctime,
293
st_dev=st_dev, st_ino=st_ino, st_mode=st_mode)
296
class SHA1Provider(object):
297
"""An interface for getting sha1s of a file."""
299
def sha1(self, abspath):
300
"""Return the sha1 of a file given its absolute path.
302
:param abspath: May be a filesystem encoded absolute path
305
raise NotImplementedError(self.sha1)
307
def stat_and_sha1(self, abspath):
308
"""Return the stat and sha1 of a file given its absolute path.
310
:param abspath: May be a filesystem encoded absolute path
313
Note: the stat should be the stat of the physical file
314
while the sha may be the sha of its canonical content.
316
raise NotImplementedError(self.stat_and_sha1)
319
class DefaultSHA1Provider(SHA1Provider):
320
"""A SHA1Provider that reads directly from the filesystem."""
322
def sha1(self, abspath):
323
"""Return the sha1 of a file given its absolute path."""
324
return osutils.sha_file_by_name(abspath)
326
def stat_and_sha1(self, abspath):
327
"""Return the stat and sha1 of a file given its absolute path."""
328
file_obj = file(abspath, 'rb')
330
statvalue = os.fstat(file_obj.fileno())
331
sha1 = osutils.sha_file(file_obj)
334
return statvalue, sha1
337
class DirState(object):
338
"""Record directory and metadata state for fast access.
340
A dirstate is a specialised data structure for managing local working
341
tree state information. Its not yet well defined whether it is platform
342
specific, and if it is how we detect/parameterize that.
344
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
345
Unlike most bzr disk formats, DirStates must be locked for reading, using
346
lock_read. (This is an os file lock internally.) This is necessary
347
because the file can be rewritten in place.
349
DirStates must be explicitly written with save() to commit changes; just
350
unlocking them does not write the changes to disk.
353
_kind_to_minikind = {
359
'tree-reference': 't',
361
_minikind_to_kind = {
367
't': 'tree-reference',
369
_stat_to_minikind = {
374
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
375
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
378
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
379
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
380
# that we are sure a single record will always fit.
381
BISECT_PAGE_SIZE = 4096
384
IN_MEMORY_UNMODIFIED = 1
385
IN_MEMORY_MODIFIED = 2
386
IN_MEMORY_HASH_MODIFIED = 3 # Only hash-cache updates
388
# A pack_stat (the x's) that is just noise and will never match the output
391
NULL_PARENT_DETAILS = static_tuple.StaticTuple('a', '', 0, False, '')
393
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
394
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
396
def __init__(self, path, sha1_provider, worth_saving_limit=0):
397
"""Create a DirState object.
399
:param path: The path at which the dirstate file on disk should live.
400
:param sha1_provider: an object meeting the SHA1Provider interface.
401
:param worth_saving_limit: when the exact number of hash changed
402
entries is known, only bother saving the dirstate if more than
403
this count of entries have changed.
404
-1 means never save hash changes, 0 means always save hash changes.
406
# _header_state and _dirblock_state represent the current state
407
# of the dirstate metadata and the per-row data respectiely.
408
# NOT_IN_MEMORY indicates that no data is in memory
409
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
410
# is the same as is on disk
411
# IN_MEMORY_MODIFIED indicates that we have a modified version
412
# of what is on disk.
413
# In future we will add more granularity, for instance _dirblock_state
414
# will probably support partially-in-memory as a separate variable,
415
# allowing for partially-in-memory unmodified and partially-in-memory
417
self._header_state = DirState.NOT_IN_MEMORY
418
self._dirblock_state = DirState.NOT_IN_MEMORY
419
# If true, an error has been detected while updating the dirstate, and
420
# for safety we're not going to commit to disk.
421
self._changes_aborted = False
425
self._state_file = None
426
self._filename = path
427
self._lock_token = None
428
self._lock_state = None
429
self._id_index = None
430
# a map from packed_stat to sha's.
431
self._packed_stat_index = None
432
self._end_of_header = None
433
self._cutoff_time = None
434
self._split_path_cache = {}
435
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
436
self._sha1_provider = sha1_provider
437
if 'hashcache' in debug.debug_flags:
438
self._sha1_file = self._sha1_file_and_mutter
440
self._sha1_file = self._sha1_provider.sha1
441
# These two attributes provide a simple cache for lookups into the
442
# dirstate in-memory vectors. By probing respectively for the last
443
# block, and for the next entry, we save nearly 2 bisections per path
445
self._last_block_index = None
446
self._last_entry_index = None
447
# The set of known hash changes
448
self._known_hash_changes = set()
449
# How many hash changed entries can we have without saving
450
self._worth_saving_limit = worth_saving_limit
454
(self.__class__.__name__, self._filename)
456
def _mark_modified(self, hash_changed_entries=None, header_modified=False):
457
"""Mark this dirstate as modified.
459
:param hash_changed_entries: if non-None, mark just these entries as
460
having their hash modified.
461
:param header_modified: mark the header modified as well, not just the
464
#trace.mutter_callsite(3, "modified hash entries: %s", hash_changed_entries)
465
if hash_changed_entries:
466
self._known_hash_changes.update([e[0] for e in hash_changed_entries])
467
if self._dirblock_state in (DirState.NOT_IN_MEMORY,
468
DirState.IN_MEMORY_UNMODIFIED):
469
# If the dirstate is already marked a IN_MEMORY_MODIFIED, then
470
# that takes precedence.
471
self._dirblock_state = DirState.IN_MEMORY_HASH_MODIFIED
473
# TODO: Since we now have a IN_MEMORY_HASH_MODIFIED state, we
474
# should fail noisily if someone tries to set
475
# IN_MEMORY_MODIFIED but we don't have a write-lock!
476
# We don't know exactly what changed so disable smart saving
477
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
479
self._header_state = DirState.IN_MEMORY_MODIFIED
481
def _mark_unmodified(self):
482
"""Mark this dirstate as unmodified."""
483
self._header_state = DirState.IN_MEMORY_UNMODIFIED
484
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
485
self._known_hash_changes = set()
487
def add(self, path, file_id, kind, stat, fingerprint):
488
"""Add a path to be tracked.
490
:param path: The path within the dirstate - '' is the root, 'foo' is the
491
path foo within the root, 'foo/bar' is the path bar within foo
493
:param file_id: The file id of the path being added.
494
:param kind: The kind of the path, as a string like 'file',
496
:param stat: The output of os.lstat for the path.
497
:param fingerprint: The sha value of the file's canonical form (i.e.
498
after any read filters have been applied),
499
or the target of a symlink,
500
or the referenced revision id for tree-references,
501
or '' for directories.
504
# find the block its in.
505
# find the location in the block.
506
# check its not there
508
#------- copied from inventory.ensure_normalized_name - keep synced.
509
# --- normalized_filename wants a unicode basename only, so get one.
510
dirname, basename = osutils.split(path)
511
# we dont import normalized_filename directly because we want to be
512
# able to change the implementation at runtime for tests.
513
norm_name, can_access = osutils.normalized_filename(basename)
514
if norm_name != basename:
518
raise errors.InvalidNormalization(path)
519
# you should never have files called . or ..; just add the directory
520
# in the parent, or according to the special treatment for the root
521
if basename == '.' or basename == '..':
522
raise errors.InvalidEntryName(path)
523
# now that we've normalised, we need the correct utf8 path and
524
# dirname and basename elements. This single encode and split should be
525
# faster than three separate encodes.
526
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
527
dirname, basename = osutils.split(utf8path)
528
# uses __class__ for speed; the check is needed for safety
529
if file_id.__class__ is not str:
530
raise AssertionError(
531
"must be a utf8 file_id not %s" % (type(file_id), ))
532
# Make sure the file_id does not exist in this tree
534
file_id_entry = self._get_entry(0, fileid_utf8=file_id, include_deleted=True)
535
if file_id_entry != (None, None):
536
if file_id_entry[1][0][0] == 'a':
537
if file_id_entry[0] != (dirname, basename, file_id):
538
# set the old name's current operation to rename
539
self.update_minimal(file_id_entry[0],
545
rename_from = file_id_entry[0][0:2]
547
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
548
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
549
info = '%s:%s' % (kind, path)
550
raise errors.DuplicateFileId(file_id, info)
551
first_key = (dirname, basename, '')
552
block_index, present = self._find_block_index_from_key(first_key)
554
# check the path is not in the tree
555
block = self._dirblocks[block_index][1]
556
entry_index, _ = self._find_entry_index(first_key, block)
557
while (entry_index < len(block) and
558
block[entry_index][0][0:2] == first_key[0:2]):
559
if block[entry_index][1][0][0] not in 'ar':
560
# this path is in the dirstate in the current tree.
561
raise Exception, "adding already added path!"
564
# The block where we want to put the file is not present. But it
565
# might be because the directory was empty, or not loaded yet. Look
566
# for a parent entry, if not found, raise NotVersionedError
567
parent_dir, parent_base = osutils.split(dirname)
568
parent_block_idx, parent_entry_idx, _, parent_present = \
569
self._get_block_entry_index(parent_dir, parent_base, 0)
570
if not parent_present:
571
raise errors.NotVersionedError(path, str(self))
572
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
573
block = self._dirblocks[block_index][1]
574
entry_key = (dirname, basename, file_id)
577
packed_stat = DirState.NULLSTAT
580
packed_stat = pack_stat(stat)
581
parent_info = self._empty_parent_info()
582
minikind = DirState._kind_to_minikind[kind]
583
if rename_from is not None:
585
old_path_utf8 = '%s/%s' % rename_from
587
old_path_utf8 = rename_from[1]
588
parent_info[0] = ('r', old_path_utf8, 0, False, '')
590
entry_data = entry_key, [
591
(minikind, fingerprint, size, False, packed_stat),
593
elif kind == 'directory':
594
entry_data = entry_key, [
595
(minikind, '', 0, False, packed_stat),
597
elif kind == 'symlink':
598
entry_data = entry_key, [
599
(minikind, fingerprint, size, False, packed_stat),
601
elif kind == 'tree-reference':
602
entry_data = entry_key, [
603
(minikind, fingerprint, 0, False, packed_stat),
606
raise errors.BzrError('unknown kind %r' % kind)
607
entry_index, present = self._find_entry_index(entry_key, block)
609
block.insert(entry_index, entry_data)
611
if block[entry_index][1][0][0] != 'a':
612
raise AssertionError(" %r(%r) already added" % (basename, file_id))
613
block[entry_index][1][0] = entry_data[1][0]
615
if kind == 'directory':
616
# insert a new dirblock
617
self._ensure_block(block_index, entry_index, utf8path)
618
self._mark_modified()
620
self._add_to_id_index(self._id_index, entry_key)
622
def _bisect(self, paths):
623
"""Bisect through the disk structure for specific rows.
625
:param paths: A list of paths to find
626
:return: A dict mapping path => entries for found entries. Missing
627
entries will not be in the map.
628
The list is not sorted, and entries will be populated
629
based on when they were read.
631
self._requires_lock()
632
# We need the file pointer to be right after the initial header block
633
self._read_header_if_needed()
634
# If _dirblock_state was in memory, we should just return info from
635
# there, this function is only meant to handle when we want to read
637
if self._dirblock_state != DirState.NOT_IN_MEMORY:
638
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
640
# The disk representation is generally info + '\0\n\0' at the end. But
641
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
642
# Because it means we can sync on the '\n'
643
state_file = self._state_file
644
file_size = os.fstat(state_file.fileno()).st_size
645
# We end up with 2 extra fields, we should have a trailing '\n' to
646
# ensure that we read the whole record, and we should have a precursur
647
# '' which ensures that we start after the previous '\n'
648
entry_field_count = self._fields_per_entry() + 1
650
low = self._end_of_header
651
high = file_size - 1 # Ignore the final '\0'
652
# Map from (dir, name) => entry
655
# Avoid infinite seeking
656
max_count = 30*len(paths)
658
# pending is a list of places to look.
659
# each entry is a tuple of low, high, dir_names
660
# low -> the first byte offset to read (inclusive)
661
# high -> the last byte offset (inclusive)
662
# dir_names -> The list of (dir, name) pairs that should be found in
663
# the [low, high] range
664
pending = [(low, high, paths)]
666
page_size = self._bisect_page_size
668
fields_to_entry = self._get_fields_to_entry()
671
low, high, cur_files = pending.pop()
673
if not cur_files or low >= high:
678
if count > max_count:
679
raise errors.BzrError('Too many seeks, most likely a bug.')
681
mid = max(low, (low+high-page_size)/2)
684
# limit the read size, so we don't end up reading data that we have
686
read_size = min(page_size, (high-mid)+1)
687
block = state_file.read(read_size)
690
entries = block.split('\n')
693
# We didn't find a '\n', so we cannot have found any records.
694
# So put this range back and try again. But we know we have to
695
# increase the page size, because a single read did not contain
696
# a record break (so records must be larger than page_size)
698
pending.append((low, high, cur_files))
701
# Check the first and last entries, in case they are partial, or if
702
# we don't care about the rest of this page
704
first_fields = entries[0].split('\0')
705
if len(first_fields) < entry_field_count:
706
# We didn't get the complete first entry
707
# so move start, and grab the next, which
708
# should be a full entry
709
start += len(entries[0])+1
710
first_fields = entries[1].split('\0')
713
if len(first_fields) <= 2:
714
# We didn't even get a filename here... what do we do?
715
# Try a large page size and repeat this query
717
pending.append((low, high, cur_files))
720
# Find what entries we are looking for, which occur before and
721
# after this first record.
724
first_path = first_fields[1] + '/' + first_fields[2]
726
first_path = first_fields[2]
727
first_loc = _bisect_path_left(cur_files, first_path)
729
# These exist before the current location
730
pre = cur_files[:first_loc]
731
# These occur after the current location, which may be in the
732
# data we read, or might be after the last entry
733
post = cur_files[first_loc:]
735
if post and len(first_fields) >= entry_field_count:
736
# We have files after the first entry
738
# Parse the last entry
739
last_entry_num = len(entries)-1
740
last_fields = entries[last_entry_num].split('\0')
741
if len(last_fields) < entry_field_count:
742
# The very last hunk was not complete,
743
# read the previous hunk
744
after = mid + len(block) - len(entries[-1])
746
last_fields = entries[last_entry_num].split('\0')
748
after = mid + len(block)
751
last_path = last_fields[1] + '/' + last_fields[2]
753
last_path = last_fields[2]
754
last_loc = _bisect_path_right(post, last_path)
756
middle_files = post[:last_loc]
757
post = post[last_loc:]
760
# We have files that should occur in this block
761
# (>= first, <= last)
762
# Either we will find them here, or we can mark them as
765
if middle_files[0] == first_path:
766
# We might need to go before this location
767
pre.append(first_path)
768
if middle_files[-1] == last_path:
769
post.insert(0, last_path)
771
# Find out what paths we have
772
paths = {first_path:[first_fields]}
773
# last_path might == first_path so we need to be
774
# careful if we should append rather than overwrite
775
if last_entry_num != first_entry_num:
776
paths.setdefault(last_path, []).append(last_fields)
777
for num in xrange(first_entry_num+1, last_entry_num):
778
# TODO: jam 20070223 We are already splitting here, so
779
# shouldn't we just split the whole thing rather
780
# than doing the split again in add_one_record?
781
fields = entries[num].split('\0')
783
path = fields[1] + '/' + fields[2]
786
paths.setdefault(path, []).append(fields)
788
for path in middle_files:
789
for fields in paths.get(path, []):
790
# offset by 1 because of the opening '\0'
791
# consider changing fields_to_entry to avoid the
793
entry = fields_to_entry(fields[1:])
794
found.setdefault(path, []).append(entry)
796
# Now we have split up everything into pre, middle, and post, and
797
# we have handled everything that fell in 'middle'.
798
# We add 'post' first, so that we prefer to seek towards the
799
# beginning, so that we will tend to go as early as we need, and
800
# then only seek forward after that.
802
pending.append((after, high, post))
804
pending.append((low, start-1, pre))
806
# Consider that we may want to return the directory entries in sorted
807
# order. For now, we just return them in whatever order we found them,
808
# and leave it up to the caller if they care if it is ordered or not.
811
def _bisect_dirblocks(self, dir_list):
812
"""Bisect through the disk structure to find entries in given dirs.
814
_bisect_dirblocks is meant to find the contents of directories, which
815
differs from _bisect, which only finds individual entries.
817
:param dir_list: A sorted list of directory names ['', 'dir', 'foo'].
818
:return: A map from dir => entries_for_dir
820
# TODO: jam 20070223 A lot of the bisecting logic could be shared
821
# between this and _bisect. It would require parameterizing the
822
# inner loop with a function, though. We should evaluate the
823
# performance difference.
824
self._requires_lock()
825
# We need the file pointer to be right after the initial header block
826
self._read_header_if_needed()
827
# If _dirblock_state was in memory, we should just return info from
828
# there, this function is only meant to handle when we want to read
830
if self._dirblock_state != DirState.NOT_IN_MEMORY:
831
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
832
# The disk representation is generally info + '\0\n\0' at the end. But
833
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
834
# Because it means we can sync on the '\n'
835
state_file = self._state_file
836
file_size = os.fstat(state_file.fileno()).st_size
837
# We end up with 2 extra fields, we should have a trailing '\n' to
838
# ensure that we read the whole record, and we should have a precursur
839
# '' which ensures that we start after the previous '\n'
840
entry_field_count = self._fields_per_entry() + 1
842
low = self._end_of_header
843
high = file_size - 1 # Ignore the final '\0'
844
# Map from dir => entry
847
# Avoid infinite seeking
848
max_count = 30*len(dir_list)
850
# pending is a list of places to look.
851
# each entry is a tuple of low, high, dir_names
852
# low -> the first byte offset to read (inclusive)
853
# high -> the last byte offset (inclusive)
854
# dirs -> The list of directories that should be found in
855
# the [low, high] range
856
pending = [(low, high, dir_list)]
858
page_size = self._bisect_page_size
860
fields_to_entry = self._get_fields_to_entry()
863
low, high, cur_dirs = pending.pop()
865
if not cur_dirs or low >= high:
870
if count > max_count:
871
raise errors.BzrError('Too many seeks, most likely a bug.')
873
mid = max(low, (low+high-page_size)/2)
876
# limit the read size, so we don't end up reading data that we have
878
read_size = min(page_size, (high-mid)+1)
879
block = state_file.read(read_size)
882
entries = block.split('\n')
885
# We didn't find a '\n', so we cannot have found any records.
886
# So put this range back and try again. But we know we have to
887
# increase the page size, because a single read did not contain
888
# a record break (so records must be larger than page_size)
890
pending.append((low, high, cur_dirs))
893
# Check the first and last entries, in case they are partial, or if
894
# we don't care about the rest of this page
896
first_fields = entries[0].split('\0')
897
if len(first_fields) < entry_field_count:
898
# We didn't get the complete first entry
899
# so move start, and grab the next, which
900
# should be a full entry
901
start += len(entries[0])+1
902
first_fields = entries[1].split('\0')
905
if len(first_fields) <= 1:
906
# We didn't even get a dirname here... what do we do?
907
# Try a large page size and repeat this query
909
pending.append((low, high, cur_dirs))
912
# Find what entries we are looking for, which occur before and
913
# after this first record.
915
first_dir = first_fields[1]
916
first_loc = bisect.bisect_left(cur_dirs, first_dir)
918
# These exist before the current location
919
pre = cur_dirs[:first_loc]
920
# These occur after the current location, which may be in the
921
# data we read, or might be after the last entry
922
post = cur_dirs[first_loc:]
924
if post and len(first_fields) >= entry_field_count:
925
# We have records to look at after the first entry
927
# Parse the last entry
928
last_entry_num = len(entries)-1
929
last_fields = entries[last_entry_num].split('\0')
930
if len(last_fields) < entry_field_count:
931
# The very last hunk was not complete,
932
# read the previous hunk
933
after = mid + len(block) - len(entries[-1])
935
last_fields = entries[last_entry_num].split('\0')
937
after = mid + len(block)
939
last_dir = last_fields[1]
940
last_loc = bisect.bisect_right(post, last_dir)
942
middle_files = post[:last_loc]
943
post = post[last_loc:]
946
# We have files that should occur in this block
947
# (>= first, <= last)
948
# Either we will find them here, or we can mark them as
951
if middle_files[0] == first_dir:
952
# We might need to go before this location
953
pre.append(first_dir)
954
if middle_files[-1] == last_dir:
955
post.insert(0, last_dir)
957
# Find out what paths we have
958
paths = {first_dir:[first_fields]}
959
# last_dir might == first_dir so we need to be
960
# careful if we should append rather than overwrite
961
if last_entry_num != first_entry_num:
962
paths.setdefault(last_dir, []).append(last_fields)
963
for num in xrange(first_entry_num+1, last_entry_num):
964
# TODO: jam 20070223 We are already splitting here, so
965
# shouldn't we just split the whole thing rather
966
# than doing the split again in add_one_record?
967
fields = entries[num].split('\0')
968
paths.setdefault(fields[1], []).append(fields)
970
for cur_dir in middle_files:
971
for fields in paths.get(cur_dir, []):
972
# offset by 1 because of the opening '\0'
973
# consider changing fields_to_entry to avoid the
975
entry = fields_to_entry(fields[1:])
976
found.setdefault(cur_dir, []).append(entry)
978
# Now we have split up everything into pre, middle, and post, and
979
# we have handled everything that fell in 'middle'.
980
# We add 'post' first, so that we prefer to seek towards the
981
# beginning, so that we will tend to go as early as we need, and
982
# then only seek forward after that.
984
pending.append((after, high, post))
986
pending.append((low, start-1, pre))
990
def _bisect_recursive(self, paths):
991
"""Bisect for entries for all paths and their children.
993
This will use bisect to find all records for the supplied paths. It
994
will then continue to bisect for any records which are marked as
995
directories. (and renames?)
997
:param paths: A sorted list of (dir, name) pairs
998
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
999
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
1001
# Map from (dir, name, file_id) => [tree_info]
1004
found_dir_names = set()
1006
# Directories that have been read
1007
processed_dirs = set()
1008
# Get the ball rolling with the first bisect for all entries.
1009
newly_found = self._bisect(paths)
1012
# Directories that need to be read
1013
pending_dirs = set()
1014
paths_to_search = set()
1015
for entry_list in newly_found.itervalues():
1016
for dir_name_id, trees_info in entry_list:
1017
found[dir_name_id] = trees_info
1018
found_dir_names.add(dir_name_id[:2])
1020
for tree_info in trees_info:
1021
minikind = tree_info[0]
1024
# We already processed this one as a directory,
1025
# we don't need to do the extra work again.
1027
subdir, name, file_id = dir_name_id
1028
path = osutils.pathjoin(subdir, name)
1030
if path not in processed_dirs:
1031
pending_dirs.add(path)
1032
elif minikind == 'r':
1033
# Rename, we need to directly search the target
1034
# which is contained in the fingerprint column
1035
dir_name = osutils.split(tree_info[1])
1036
if dir_name[0] in pending_dirs:
1037
# This entry will be found in the dir search
1039
if dir_name not in found_dir_names:
1040
paths_to_search.add(tree_info[1])
1041
# Now we have a list of paths to look for directly, and
1042
# directory blocks that need to be read.
1043
# newly_found is mixing the keys between (dir, name) and path
1044
# entries, but that is okay, because we only really care about the
1046
newly_found = self._bisect(sorted(paths_to_search))
1047
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
1048
processed_dirs.update(pending_dirs)
1051
def _discard_merge_parents(self):
1052
"""Discard any parents trees beyond the first.
1054
Note that if this fails the dirstate is corrupted.
1056
After this function returns the dirstate contains 2 trees, neither of
1059
self._read_header_if_needed()
1060
parents = self.get_parent_ids()
1061
if len(parents) < 1:
1063
# only require all dirblocks if we are doing a full-pass removal.
1064
self._read_dirblocks_if_needed()
1065
dead_patterns = set([('a', 'r'), ('a', 'a'), ('r', 'r'), ('r', 'a')])
1066
def iter_entries_removable():
1067
for block in self._dirblocks:
1068
deleted_positions = []
1069
for pos, entry in enumerate(block[1]):
1071
if (entry[1][0][0], entry[1][1][0]) in dead_patterns:
1072
deleted_positions.append(pos)
1073
if deleted_positions:
1074
if len(deleted_positions) == len(block[1]):
1077
for pos in reversed(deleted_positions):
1079
# if the first parent is a ghost:
1080
if parents[0] in self.get_ghosts():
1081
empty_parent = [DirState.NULL_PARENT_DETAILS]
1082
for entry in iter_entries_removable():
1083
entry[1][1:] = empty_parent
1085
for entry in iter_entries_removable():
1089
self._parents = [parents[0]]
1090
self._mark_modified(header_modified=True)
1092
def _empty_parent_info(self):
1093
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
1096
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
1097
"""Ensure a block for dirname exists.
1099
This function exists to let callers which know that there is a
1100
directory dirname ensure that the block for it exists. This block can
1101
fail to exist because of demand loading, or because a directory had no
1102
children. In either case it is not an error. It is however an error to
1103
call this if there is no parent entry for the directory, and thus the
1104
function requires the coordinates of such an entry to be provided.
1106
The root row is special cased and can be indicated with a parent block
1109
:param parent_block_index: The index of the block in which dirname's row
1111
:param parent_row_index: The index in the parent block where the row
1113
:param dirname: The utf8 dirname to ensure there is a block for.
1114
:return: The index for the block.
1116
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
1117
# This is the signature of the root row, and the
1118
# contents-of-root row is always index 1
1120
# the basename of the directory must be the end of its full name.
1121
if not (parent_block_index == -1 and
1122
parent_block_index == -1 and dirname == ''):
1123
if not dirname.endswith(
1124
self._dirblocks[parent_block_index][1][parent_row_index][0][1]):
1125
raise AssertionError("bad dirname %r" % dirname)
1126
block_index, present = self._find_block_index_from_key((dirname, '', ''))
1128
## In future, when doing partial parsing, this should load and
1129
# populate the entire block.
1130
self._dirblocks.insert(block_index, (dirname, []))
1133
def _entries_to_current_state(self, new_entries):
1134
"""Load new_entries into self.dirblocks.
1136
Process new_entries into the current state object, making them the active
1137
state. The entries are grouped together by directory to form dirblocks.
1139
:param new_entries: A sorted list of entries. This function does not sort
1140
to prevent unneeded overhead when callers have a sorted list already.
1143
if new_entries[0][0][0:2] != ('', ''):
1144
raise AssertionError(
1145
"Missing root row %r" % (new_entries[0][0],))
1146
# The two blocks here are deliberate: the root block and the
1147
# contents-of-root block.
1148
self._dirblocks = [('', []), ('', [])]
1149
current_block = self._dirblocks[0][1]
1150
current_dirname = ''
1152
append_entry = current_block.append
1153
for entry in new_entries:
1154
if entry[0][0] != current_dirname:
1155
# new block - different dirname
1157
current_dirname = entry[0][0]
1158
self._dirblocks.append((current_dirname, current_block))
1159
append_entry = current_block.append
1160
# append the entry to the current block
1162
self._split_root_dirblock_into_contents()
1164
def _split_root_dirblock_into_contents(self):
1165
"""Split the root dirblocks into root and contents-of-root.
1167
After parsing by path, we end up with root entries and contents-of-root
1168
entries in the same block. This loop splits them out again.
1170
# The above loop leaves the "root block" entries mixed with the
1171
# "contents-of-root block". But we don't want an if check on
1172
# all entries, so instead we just fix it up here.
1173
if self._dirblocks[1] != ('', []):
1174
raise ValueError("bad dirblock start %r" % (self._dirblocks[1],))
1176
contents_of_root_block = []
1177
for entry in self._dirblocks[0][1]:
1178
if not entry[0][1]: # This is a root entry
1179
root_block.append(entry)
1181
contents_of_root_block.append(entry)
1182
self._dirblocks[0] = ('', root_block)
1183
self._dirblocks[1] = ('', contents_of_root_block)
1185
def _entries_for_path(self, path):
1186
"""Return a list with all the entries that match path for all ids."""
1187
dirname, basename = os.path.split(path)
1188
key = (dirname, basename, '')
1189
block_index, present = self._find_block_index_from_key(key)
1191
# the block which should contain path is absent.
1194
block = self._dirblocks[block_index][1]
1195
entry_index, _ = self._find_entry_index(key, block)
1196
# we may need to look at multiple entries at this path: walk while the specific_files match.
1197
while (entry_index < len(block) and
1198
block[entry_index][0][0:2] == key[0:2]):
1199
result.append(block[entry_index])
1203
def _entry_to_line(self, entry):
1204
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
1206
:param entry: An entry_tuple as defined in the module docstring.
1208
entire_entry = list(entry[0])
1209
for tree_number, tree_data in enumerate(entry[1]):
1210
# (minikind, fingerprint, size, executable, tree_specific_string)
1211
entire_entry.extend(tree_data)
1212
# 3 for the key, 5 for the fields per tree.
1213
tree_offset = 3 + tree_number * 5
1215
entire_entry[tree_offset + 0] = tree_data[0]
1217
entire_entry[tree_offset + 2] = str(tree_data[2])
1219
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
1220
return '\0'.join(entire_entry)
1222
def _fields_per_entry(self):
1223
"""How many null separated fields should be in each entry row.
1225
Each line now has an extra '\\n' field which is not used
1226
so we just skip over it
1229
3 fields for the key
1230
+ number of fields per tree_data (5) * tree count
1233
tree_count = 1 + self._num_present_parents()
1234
return 3 + 5 * tree_count + 1
1236
def _find_block(self, key, add_if_missing=False):
1237
"""Return the block that key should be present in.
1239
:param key: A dirstate entry key.
1240
:return: The block tuple.
1242
block_index, present = self._find_block_index_from_key(key)
1244
if not add_if_missing:
1245
# check to see if key is versioned itself - we might want to
1246
# add it anyway, because dirs with no entries dont get a
1247
# dirblock at parse time.
1248
# This is an uncommon branch to take: most dirs have children,
1249
# and most code works with versioned paths.
1250
parent_base, parent_name = osutils.split(key[0])
1251
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
1252
# some parent path has not been added - its an error to add
1254
raise errors.NotVersionedError(key[0:2], str(self))
1255
self._dirblocks.insert(block_index, (key[0], []))
1256
return self._dirblocks[block_index]
1258
def _find_block_index_from_key(self, key):
1259
"""Find the dirblock index for a key.
1261
:return: The block index, True if the block for the key is present.
1263
if key[0:2] == ('', ''):
1266
if (self._last_block_index is not None and
1267
self._dirblocks[self._last_block_index][0] == key[0]):
1268
return self._last_block_index, True
1271
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
1272
cache=self._split_path_cache)
1273
# _right returns one-past-where-key is so we have to subtract
1274
# one to use it. we use _right here because there are two
1275
# '' blocks - the root, and the contents of root
1276
# we always have a minimum of 2 in self._dirblocks: root and
1277
# root-contents, and for '', we get 2 back, so this is
1278
# simple and correct:
1279
present = (block_index < len(self._dirblocks) and
1280
self._dirblocks[block_index][0] == key[0])
1281
self._last_block_index = block_index
1282
# Reset the entry index cache to the beginning of the block.
1283
self._last_entry_index = -1
1284
return block_index, present
1286
def _find_entry_index(self, key, block):
1287
"""Find the entry index for a key in a block.
1289
:return: The entry index, True if the entry for the key is present.
1291
len_block = len(block)
1293
if self._last_entry_index is not None:
1295
entry_index = self._last_entry_index + 1
1296
# A hit is when the key is after the last slot, and before or
1297
# equal to the next slot.
1298
if ((entry_index > 0 and block[entry_index - 1][0] < key) and
1299
key <= block[entry_index][0]):
1300
self._last_entry_index = entry_index
1301
present = (block[entry_index][0] == key)
1302
return entry_index, present
1305
entry_index = bisect.bisect_left(block, (key, []))
1306
present = (entry_index < len_block and
1307
block[entry_index][0] == key)
1308
self._last_entry_index = entry_index
1309
return entry_index, present
1312
def from_tree(tree, dir_state_filename, sha1_provider=None):
1313
"""Create a dirstate from a bzr Tree.
1315
:param tree: The tree which should provide parent information and
1317
:param sha1_provider: an object meeting the SHA1Provider interface.
1318
If None, a DefaultSHA1Provider is used.
1319
:return: a DirState object which is currently locked for writing.
1320
(it was locked by DirState.initialize)
1322
result = DirState.initialize(dir_state_filename,
1323
sha1_provider=sha1_provider)
1327
parent_ids = tree.get_parent_ids()
1328
num_parents = len(parent_ids)
1330
for parent_id in parent_ids:
1331
parent_tree = tree.branch.repository.revision_tree(parent_id)
1332
parent_trees.append((parent_id, parent_tree))
1333
parent_tree.lock_read()
1334
result.set_parent_trees(parent_trees, [])
1335
result.set_state_from_inventory(tree.inventory)
1337
for revid, parent_tree in parent_trees:
1338
parent_tree.unlock()
1341
# The caller won't have a chance to unlock this, so make sure we
1347
def _check_delta_is_valid(self, delta):
1348
return list(inventory._check_delta_unique_ids(
1349
inventory._check_delta_unique_old_paths(
1350
inventory._check_delta_unique_new_paths(
1351
inventory._check_delta_ids_match_entry(
1352
inventory._check_delta_ids_are_valid(
1353
inventory._check_delta_new_path_entry_both_or_None(delta)))))))
1355
def update_by_delta(self, delta):
1356
"""Apply an inventory delta to the dirstate for tree 0
1358
This is the workhorse for apply_inventory_delta in dirstate based
1361
:param delta: An inventory delta. See Inventory.apply_delta for
1364
self._read_dirblocks_if_needed()
1365
encode = cache_utf8.encode
1368
# Accumulate parent references (path_utf8, id), to check for parentless
1369
# items or items placed under files/links/tree-references. We get
1370
# references from every item in the delta that is not a deletion and
1371
# is not itself the root.
1373
# Added ids must not be in the dirstate already. This set holds those
1376
# This loop transforms the delta to single atomic operations that can
1377
# be executed and validated.
1378
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1379
for old_path, new_path, file_id, inv_entry in delta:
1380
if (file_id in insertions) or (file_id in removals):
1381
self._raise_invalid(old_path or new_path, file_id,
1383
if old_path is not None:
1384
old_path = old_path.encode('utf-8')
1385
removals[file_id] = old_path
1387
new_ids.add(file_id)
1388
if new_path is not None:
1389
if inv_entry is None:
1390
self._raise_invalid(new_path, file_id,
1391
"new_path with no entry")
1392
new_path = new_path.encode('utf-8')
1393
dirname_utf8, basename = osutils.split(new_path)
1395
parents.add((dirname_utf8, inv_entry.parent_id))
1396
key = (dirname_utf8, basename, file_id)
1397
minikind = DirState._kind_to_minikind[inv_entry.kind]
1399
fingerprint = inv_entry.reference_revision or ''
1402
insertions[file_id] = (key, minikind, inv_entry.executable,
1403
fingerprint, new_path)
1404
# Transform moves into delete+add pairs
1405
if None not in (old_path, new_path):
1406
for child in self._iter_child_entries(0, old_path):
1407
if child[0][2] in insertions or child[0][2] in removals:
1409
child_dirname = child[0][0]
1410
child_basename = child[0][1]
1411
minikind = child[1][0][0]
1412
fingerprint = child[1][0][4]
1413
executable = child[1][0][3]
1414
old_child_path = osutils.pathjoin(child_dirname,
1416
removals[child[0][2]] = old_child_path
1417
child_suffix = child_dirname[len(old_path):]
1418
new_child_dirname = (new_path + child_suffix)
1419
key = (new_child_dirname, child_basename, child[0][2])
1420
new_child_path = osutils.pathjoin(new_child_dirname,
1422
insertions[child[0][2]] = (key, minikind, executable,
1423
fingerprint, new_child_path)
1424
self._check_delta_ids_absent(new_ids, delta, 0)
1426
self._apply_removals(removals.iteritems())
1427
self._apply_insertions(insertions.values())
1429
self._after_delta_check_parents(parents, 0)
1430
except errors.BzrError, e:
1431
self._changes_aborted = True
1432
if 'integrity error' not in str(e):
1434
# _get_entry raises BzrError when a request is inconsistent; we
1435
# want such errors to be shown as InconsistentDelta - and that
1436
# fits the behaviour we trigger.
1437
raise errors.InconsistentDeltaDelta(delta,
1438
"error from _get_entry. %s" % (e,))
1440
def _apply_removals(self, removals):
1441
for file_id, path in sorted(removals, reverse=True,
1442
key=operator.itemgetter(1)):
1443
dirname, basename = osutils.split(path)
1444
block_i, entry_i, d_present, f_present = \
1445
self._get_block_entry_index(dirname, basename, 0)
1447
entry = self._dirblocks[block_i][1][entry_i]
1449
self._raise_invalid(path, file_id,
1450
"Wrong path for old path.")
1451
if not f_present or entry[1][0][0] in 'ar':
1452
self._raise_invalid(path, file_id,
1453
"Wrong path for old path.")
1454
if file_id != entry[0][2]:
1455
self._raise_invalid(path, file_id,
1456
"Attempt to remove path has wrong id - found %r."
1458
self._make_absent(entry)
1459
# See if we have a malformed delta: deleting a directory must not
1460
# leave crud behind. This increases the number of bisects needed
1461
# substantially, but deletion or renames of large numbers of paths
1462
# is rare enough it shouldn't be an issue (famous last words?) RBC
1464
block_i, entry_i, d_present, f_present = \
1465
self._get_block_entry_index(path, '', 0)
1467
# The dir block is still present in the dirstate; this could
1468
# be due to it being in a parent tree, or a corrupt delta.
1469
for child_entry in self._dirblocks[block_i][1]:
1470
if child_entry[1][0][0] not in ('r', 'a'):
1471
self._raise_invalid(path, entry[0][2],
1472
"The file id was deleted but its children were "
1475
def _apply_insertions(self, adds):
1477
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1478
self.update_minimal(key, minikind, executable, fingerprint,
1479
path_utf8=path_utf8)
1480
except errors.NotVersionedError:
1481
self._raise_invalid(path_utf8.decode('utf8'), key[2],
1484
def update_basis_by_delta(self, delta, new_revid):
1485
"""Update the parents of this tree after a commit.
1487
This gives the tree one parent, with revision id new_revid. The
1488
inventory delta is applied to the current basis tree to generate the
1489
inventory for the parent new_revid, and all other parent trees are
1492
Note that an exception during the operation of this method will leave
1493
the dirstate in a corrupt state where it should not be saved.
1495
:param new_revid: The new revision id for the trees parent.
1496
:param delta: An inventory delta (see apply_inventory_delta) describing
1497
the changes from the current left most parent revision to new_revid.
1499
self._read_dirblocks_if_needed()
1500
self._discard_merge_parents()
1501
if self._ghosts != []:
1502
raise NotImplementedError(self.update_basis_by_delta)
1503
if len(self._parents) == 0:
1504
# setup a blank tree, the most simple way.
1505
empty_parent = DirState.NULL_PARENT_DETAILS
1506
for entry in self._iter_entries():
1507
entry[1].append(empty_parent)
1508
self._parents.append(new_revid)
1510
self._parents[0] = new_revid
1512
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1516
# The paths this function accepts are unicode and must be encoded as we
1518
encode = cache_utf8.encode
1519
inv_to_entry = self._inv_entry_to_details
1520
# delta is now (deletes, changes), (adds) in reverse lexographical
1522
# deletes in reverse lexographic order are safe to process in situ.
1523
# renames are not, as a rename from any path could go to a path
1524
# lexographically lower, so we transform renames into delete, add pairs,
1525
# expanding them recursively as needed.
1526
# At the same time, to reduce interface friction we convert the input
1527
# inventory entries to dirstate.
1528
root_only = ('', '')
1529
# Accumulate parent references (path_utf8, id), to check for parentless
1530
# items or items placed under files/links/tree-references. We get
1531
# references from every item in the delta that is not a deletion and
1532
# is not itself the root.
1534
# Added ids must not be in the dirstate already. This set holds those
1537
for old_path, new_path, file_id, inv_entry in delta:
1538
if inv_entry is not None and file_id != inv_entry.file_id:
1539
self._raise_invalid(new_path, file_id,
1540
"mismatched entry file_id %r" % inv_entry)
1541
if new_path is None:
1542
new_path_utf8 = None
1544
if inv_entry is None:
1545
self._raise_invalid(new_path, file_id,
1546
"new_path with no entry")
1547
new_path_utf8 = encode(new_path)
1548
# note the parent for validation
1549
dirname_utf8, basename_utf8 = osutils.split(new_path_utf8)
1551
parents.add((dirname_utf8, inv_entry.parent_id))
1552
if old_path is None:
1553
old_path_utf8 = None
1555
old_path_utf8 = encode(old_path)
1556
if old_path is None:
1557
adds.append((None, new_path_utf8, file_id,
1558
inv_to_entry(inv_entry), True))
1559
new_ids.add(file_id)
1560
elif new_path is None:
1561
deletes.append((old_path_utf8, None, file_id, None, True))
1562
elif (old_path, new_path) == root_only:
1563
# change things in-place
1564
# Note: the case of a parent directory changing its file_id
1565
# tends to break optimizations here, because officially
1566
# the file has actually been moved, it just happens to
1567
# end up at the same path. If we can figure out how to
1568
# handle that case, we can avoid a lot of add+delete
1569
# pairs for objects that stay put.
1570
# elif old_path == new_path:
1571
changes.append((old_path_utf8, new_path_utf8, file_id,
1572
inv_to_entry(inv_entry)))
1575
# Because renames must preserve their children we must have
1576
# processed all relocations and removes before hand. The sort
1577
# order ensures we've examined the child paths, but we also
1578
# have to execute the removals, or the split to an add/delete
1579
# pair will result in the deleted item being reinserted, or
1580
# renamed items being reinserted twice - and possibly at the
1581
# wrong place. Splitting into a delete/add pair also simplifies
1582
# the handling of entries with ('f', ...), ('r' ...) because
1583
# the target of the 'r' is old_path here, and we add that to
1584
# deletes, meaning that the add handler does not need to check
1585
# for 'r' items on every pass.
1586
self._update_basis_apply_deletes(deletes)
1588
# Split into an add/delete pair recursively.
1589
adds.append((old_path_utf8, new_path_utf8, file_id,
1590
inv_to_entry(inv_entry), False))
1591
# Expunge deletes that we've seen so that deleted/renamed
1592
# children of a rename directory are handled correctly.
1593
new_deletes = reversed(list(
1594
self._iter_child_entries(1, old_path_utf8)))
1595
# Remove the current contents of the tree at orig_path, and
1596
# reinsert at the correct new path.
1597
for entry in new_deletes:
1598
child_dirname, child_basename, child_file_id = entry[0]
1600
source_path = child_dirname + '/' + child_basename
1602
source_path = child_basename
1604
target_path = new_path_utf8 + source_path[len(old_path):]
1607
raise AssertionError("cannot rename directory to"
1609
target_path = source_path[len(old_path) + 1:]
1610
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1612
(source_path, target_path, entry[0][2], None, False))
1613
deletes.append((old_path_utf8, new_path, file_id, None, False))
1614
self._check_delta_ids_absent(new_ids, delta, 1)
1616
# Finish expunging deletes/first half of renames.
1617
self._update_basis_apply_deletes(deletes)
1618
# Reinstate second half of renames and new paths.
1619
self._update_basis_apply_adds(adds)
1620
# Apply in-situ changes.
1621
self._update_basis_apply_changes(changes)
1623
self._after_delta_check_parents(parents, 1)
1624
except errors.BzrError, e:
1625
self._changes_aborted = True
1626
if 'integrity error' not in str(e):
1628
# _get_entry raises BzrError when a request is inconsistent; we
1629
# want such errors to be shown as InconsistentDelta - and that
1630
# fits the behaviour we trigger.
1631
raise errors.InconsistentDeltaDelta(delta,
1632
"error from _get_entry. %s" % (e,))
1634
self._mark_modified(header_modified=True)
1635
self._id_index = None
1638
def _check_delta_ids_absent(self, new_ids, delta, tree_index):
1639
"""Check that none of the file_ids in new_ids are present in a tree."""
1642
id_index = self._get_id_index()
1643
for file_id in new_ids:
1644
for key in id_index.get(file_id, ()):
1645
block_i, entry_i, d_present, f_present = \
1646
self._get_block_entry_index(key[0], key[1], tree_index)
1648
# In a different tree
1650
entry = self._dirblocks[block_i][1][entry_i]
1651
if entry[0][2] != file_id:
1652
# Different file_id, so not what we want.
1654
self._raise_invalid(("%s/%s" % key[0:2]).decode('utf8'), file_id,
1655
"This file_id is new in the delta but already present in "
1658
def _raise_invalid(self, path, file_id, reason):
1659
self._changes_aborted = True
1660
raise errors.InconsistentDelta(path, file_id, reason)
1662
def _update_basis_apply_adds(self, adds):
1663
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1665
They may be adds, or renames that have been split into add/delete
1668
:param adds: A sequence of adds. Each add is a tuple:
1669
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1670
is False when the add is the second half of a remove-and-reinsert
1671
pair created to handle renames and deletes.
1673
# Adds are accumulated partly from renames, so can be in any input
1675
# TODO: we may want to sort in dirblocks order. That way each entry
1676
# will end up in the same directory, allowing the _get_entry
1677
# fast-path for looking up 2 items in the same dir work.
1678
adds.sort(key=lambda x: x[1])
1679
# adds is now in lexographic order, which places all parents before
1680
# their children, so we can process it linearly.
1682
st = static_tuple.StaticTuple
1683
for old_path, new_path, file_id, new_details, real_add in adds:
1684
dirname, basename = osutils.split(new_path)
1685
entry_key = st(dirname, basename, file_id)
1686
block_index, present = self._find_block_index_from_key(entry_key)
1688
self._raise_invalid(new_path, file_id,
1689
"Unable to find block for this record."
1690
" Was the parent added?")
1691
block = self._dirblocks[block_index][1]
1692
entry_index, present = self._find_entry_index(entry_key, block)
1694
if old_path is not None:
1695
self._raise_invalid(new_path, file_id,
1696
'considered a real add but still had old_path at %s'
1699
entry = block[entry_index]
1700
basis_kind = entry[1][1][0]
1701
if basis_kind == 'a':
1702
entry[1][1] = new_details
1703
elif basis_kind == 'r':
1704
raise NotImplementedError()
1706
self._raise_invalid(new_path, file_id,
1707
"An entry was marked as a new add"
1708
" but the basis target already existed")
1710
# The exact key was not found in the block. However, we need to
1711
# check if there is a key next to us that would have matched.
1712
# We only need to check 2 locations, because there are only 2
1714
for maybe_index in range(entry_index-1, entry_index+1):
1715
if maybe_index < 0 or maybe_index >= len(block):
1717
maybe_entry = block[maybe_index]
1718
if maybe_entry[0][:2] != (dirname, basename):
1719
# Just a random neighbor
1721
if maybe_entry[0][2] == file_id:
1722
raise AssertionError(
1723
'_find_entry_index didnt find a key match'
1724
' but walking the data did, for %s'
1726
basis_kind = maybe_entry[1][1][0]
1727
if basis_kind not in 'ar':
1728
self._raise_invalid(new_path, file_id,
1729
"we have an add record for path, but the path"
1730
" is already present with another file_id %s"
1731
% (maybe_entry[0][2],))
1733
entry = (entry_key, [DirState.NULL_PARENT_DETAILS,
1735
block.insert(entry_index, entry)
1737
active_kind = entry[1][0][0]
1738
if active_kind == 'a':
1739
# The active record shows up as absent, this could be genuine,
1740
# or it could be present at some other location. We need to
1742
id_index = self._get_id_index()
1743
# The id_index may not be perfectly accurate for tree1, because
1744
# we haven't been keeping it updated. However, it should be
1745
# fine for tree0, and that gives us enough info for what we
1747
keys = id_index.get(file_id, ())
1749
block_i, entry_i, d_present, f_present = \
1750
self._get_block_entry_index(key[0], key[1], 0)
1753
active_entry = self._dirblocks[block_i][1][entry_i]
1754
if (active_entry[0][2] != file_id):
1755
# Some other file is at this path, we don't need to
1758
real_active_kind = active_entry[1][0][0]
1759
if real_active_kind in 'ar':
1760
# We found a record, which was not *this* record,
1761
# which matches the file_id, but is not actually
1762
# present. Something seems *really* wrong.
1763
self._raise_invalid(new_path, file_id,
1764
"We found a tree0 entry that doesnt make sense")
1765
# Now, we've found a tree0 entry which matches the file_id
1766
# but is at a different location. So update them to be
1768
active_dir, active_name = active_entry[0][:2]
1770
active_path = active_dir + '/' + active_name
1772
active_path = active_name
1773
active_entry[1][1] = st('r', new_path, 0, False, '')
1774
entry[1][0] = st('r', active_path, 0, False, '')
1775
elif active_kind == 'r':
1776
raise NotImplementedError()
1778
new_kind = new_details[0]
1780
self._ensure_block(block_index, entry_index, new_path)
1782
def _update_basis_apply_changes(self, changes):
1783
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1785
:param adds: A sequence of changes. Each change is a tuple:
1786
(path_utf8, path_utf8, file_id, (entry_details))
1789
for old_path, new_path, file_id, new_details in changes:
1790
# the entry for this file_id must be in tree 0.
1791
entry = self._get_entry(1, file_id, new_path)
1792
if entry[0] is None or entry[1][1][0] in 'ar':
1793
self._raise_invalid(new_path, file_id,
1794
'changed entry considered not present')
1795
entry[1][1] = new_details
1797
def _update_basis_apply_deletes(self, deletes):
1798
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1800
They may be deletes, or renames that have been split into add/delete
1803
:param deletes: A sequence of deletes. Each delete is a tuple:
1804
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1805
real_delete is True when the desired outcome is an actual deletion
1806
rather than the rename handling logic temporarily deleting a path
1807
during the replacement of a parent.
1809
null = DirState.NULL_PARENT_DETAILS
1810
for old_path, new_path, file_id, _, real_delete in deletes:
1811
if real_delete != (new_path is None):
1812
self._raise_invalid(old_path, file_id, "bad delete delta")
1813
# the entry for this file_id must be in tree 1.
1814
dirname, basename = osutils.split(old_path)
1815
block_index, entry_index, dir_present, file_present = \
1816
self._get_block_entry_index(dirname, basename, 1)
1817
if not file_present:
1818
self._raise_invalid(old_path, file_id,
1819
'basis tree does not contain removed entry')
1820
entry = self._dirblocks[block_index][1][entry_index]
1821
# The state of the entry in the 'active' WT
1822
active_kind = entry[1][0][0]
1823
if entry[0][2] != file_id:
1824
self._raise_invalid(old_path, file_id,
1825
'mismatched file_id in tree 1')
1827
old_kind = entry[1][1][0]
1828
if active_kind in 'ar':
1829
# The active tree doesn't have this file_id.
1830
# The basis tree is changing this record. If this is a
1831
# rename, then we don't want the record here at all
1832
# anymore. If it is just an in-place change, we want the
1833
# record here, but we'll add it if we need to. So we just
1835
if active_kind == 'r':
1836
active_path = entry[1][0][1]
1837
active_entry = self._get_entry(0, file_id, active_path)
1838
if active_entry[1][1][0] != 'r':
1839
self._raise_invalid(old_path, file_id,
1840
"Dirstate did not have matching rename entries")
1841
elif active_entry[1][0][0] in 'ar':
1842
self._raise_invalid(old_path, file_id,
1843
"Dirstate had a rename pointing at an inactive"
1845
active_entry[1][1] = null
1846
del self._dirblocks[block_index][1][entry_index]
1848
# This was a directory, and the active tree says it
1849
# doesn't exist, and now the basis tree says it doesn't
1850
# exist. Remove its dirblock if present
1852
present) = self._find_block_index_from_key(
1855
dir_block = self._dirblocks[dir_block_index][1]
1857
# This entry is empty, go ahead and just remove it
1858
del self._dirblocks[dir_block_index]
1860
# There is still an active record, so just mark this
1863
block_i, entry_i, d_present, f_present = \
1864
self._get_block_entry_index(old_path, '', 1)
1866
dir_block = self._dirblocks[block_i][1]
1867
for child_entry in dir_block:
1868
child_basis_kind = child_entry[1][1][0]
1869
if child_basis_kind not in 'ar':
1870
self._raise_invalid(old_path, file_id,
1871
"The file id was deleted but its children were "
1874
def _after_delta_check_parents(self, parents, index):
1875
"""Check that parents required by the delta are all intact.
1877
:param parents: An iterable of (path_utf8, file_id) tuples which are
1878
required to be present in tree 'index' at path_utf8 with id file_id
1880
:param index: The column in the dirstate to check for parents in.
1882
for dirname_utf8, file_id in parents:
1883
# Get the entry - the ensures that file_id, dirname_utf8 exists and
1884
# has the right file id.
1885
entry = self._get_entry(index, file_id, dirname_utf8)
1886
if entry[1] is None:
1887
self._raise_invalid(dirname_utf8.decode('utf8'),
1888
file_id, "This parent is not present.")
1889
# Parents of things must be directories
1890
if entry[1][index][0] != 'd':
1891
self._raise_invalid(dirname_utf8.decode('utf8'),
1892
file_id, "This parent is not a directory.")
1894
def _observed_sha1(self, entry, sha1, stat_value,
1895
_stat_to_minikind=_stat_to_minikind, _pack_stat=pack_stat):
1896
"""Note the sha1 of a file.
1898
:param entry: The entry the sha1 is for.
1899
:param sha1: The observed sha1.
1900
:param stat_value: The os.lstat for the file.
1903
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
1907
packed_stat = _pack_stat(stat_value)
1909
if self._cutoff_time is None:
1910
self._sha_cutoff_time()
1911
if (stat_value.st_mtime < self._cutoff_time
1912
and stat_value.st_ctime < self._cutoff_time):
1913
entry[1][0] = ('f', sha1, stat_value.st_size, entry[1][0][3],
1915
self._mark_modified([entry])
1917
def _sha_cutoff_time(self):
1918
"""Return cutoff time.
1920
Files modified more recently than this time are at risk of being
1921
undetectably modified and so can't be cached.
1923
# Cache the cutoff time as long as we hold a lock.
1924
# time.time() isn't super expensive (approx 3.38us), but
1925
# when you call it 50,000 times it adds up.
1926
# For comparison, os.lstat() costs 7.2us if it is hot.
1927
self._cutoff_time = int(time.time()) - 3
1928
return self._cutoff_time
1930
def _lstat(self, abspath, entry):
1931
"""Return the os.lstat value for this path."""
1932
return os.lstat(abspath)
1934
def _sha1_file_and_mutter(self, abspath):
1935
# when -Dhashcache is turned on, this is monkey-patched in to log
1937
trace.mutter("dirstate sha1 " + abspath)
1938
return self._sha1_provider.sha1(abspath)
1940
def _is_executable(self, mode, old_executable):
1941
"""Is this file executable?"""
1942
return bool(S_IEXEC & mode)
1944
def _is_executable_win32(self, mode, old_executable):
1945
"""On win32 the executable bit is stored in the dirstate."""
1946
return old_executable
1948
if sys.platform == 'win32':
1949
_is_executable = _is_executable_win32
1951
def _read_link(self, abspath, old_link):
1952
"""Read the target of a symlink"""
1953
# TODO: jam 200700301 On Win32, this could just return the value
1954
# already in memory. However, this really needs to be done at a
1955
# higher level, because there either won't be anything on disk,
1956
# or the thing on disk will be a file.
1957
fs_encoding = osutils._fs_enc
1958
if isinstance(abspath, unicode):
1959
# abspath is defined as the path to pass to lstat. readlink is
1960
# buggy in python < 2.6 (it doesn't encode unicode path into FS
1961
# encoding), so we need to encode ourselves knowing that unicode
1962
# paths are produced by UnicodeDirReader on purpose.
1963
abspath = abspath.encode(fs_encoding)
1964
target = os.readlink(abspath)
1965
if fs_encoding not in ('UTF-8', 'US-ASCII', 'ANSI_X3.4-1968'):
1966
# Change encoding if needed
1967
target = target.decode(fs_encoding).encode('UTF-8')
1970
def get_ghosts(self):
1971
"""Return a list of the parent tree revision ids that are ghosts."""
1972
self._read_header_if_needed()
1975
def get_lines(self):
1976
"""Serialise the entire dirstate to a sequence of lines."""
1977
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1978
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1979
# read what's on disk.
1980
self._state_file.seek(0)
1981
return self._state_file.readlines()
1983
lines.append(self._get_parents_line(self.get_parent_ids()))
1984
lines.append(self._get_ghosts_line(self._ghosts))
1985
lines.extend(self._get_entry_lines())
1986
return self._get_output_lines(lines)
1988
def _get_ghosts_line(self, ghost_ids):
1989
"""Create a line for the state file for ghost information."""
1990
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1992
def _get_parents_line(self, parent_ids):
1993
"""Create a line for the state file for parents information."""
1994
return '\0'.join([str(len(parent_ids))] + parent_ids)
1996
def _get_entry_lines(self):
1997
"""Create lines for entries."""
1998
return map(self._entry_to_line, self._iter_entries())
2000
def _get_fields_to_entry(self):
2001
"""Get a function which converts entry fields into a entry record.
2003
This handles size and executable, as well as parent records.
2005
:return: A function which takes a list of fields, and returns an
2006
appropriate record for storing in memory.
2008
# This is intentionally unrolled for performance
2009
num_present_parents = self._num_present_parents()
2010
if num_present_parents == 0:
2011
def fields_to_entry_0_parents(fields, _int=int):
2012
path_name_file_id_key = (fields[0], fields[1], fields[2])
2013
return (path_name_file_id_key, [
2015
fields[3], # minikind
2016
fields[4], # fingerprint
2017
_int(fields[5]), # size
2018
fields[6] == 'y', # executable
2019
fields[7], # packed_stat or revision_id
2021
return fields_to_entry_0_parents
2022
elif num_present_parents == 1:
2023
def fields_to_entry_1_parent(fields, _int=int):
2024
path_name_file_id_key = (fields[0], fields[1], fields[2])
2025
return (path_name_file_id_key, [
2027
fields[3], # minikind
2028
fields[4], # fingerprint
2029
_int(fields[5]), # size
2030
fields[6] == 'y', # executable
2031
fields[7], # packed_stat or revision_id
2034
fields[8], # minikind
2035
fields[9], # fingerprint
2036
_int(fields[10]), # size
2037
fields[11] == 'y', # executable
2038
fields[12], # packed_stat or revision_id
2041
return fields_to_entry_1_parent
2042
elif num_present_parents == 2:
2043
def fields_to_entry_2_parents(fields, _int=int):
2044
path_name_file_id_key = (fields[0], fields[1], fields[2])
2045
return (path_name_file_id_key, [
2047
fields[3], # minikind
2048
fields[4], # fingerprint
2049
_int(fields[5]), # size
2050
fields[6] == 'y', # executable
2051
fields[7], # packed_stat or revision_id
2054
fields[8], # minikind
2055
fields[9], # fingerprint
2056
_int(fields[10]), # size
2057
fields[11] == 'y', # executable
2058
fields[12], # packed_stat or revision_id
2061
fields[13], # minikind
2062
fields[14], # fingerprint
2063
_int(fields[15]), # size
2064
fields[16] == 'y', # executable
2065
fields[17], # packed_stat or revision_id
2068
return fields_to_entry_2_parents
2070
def fields_to_entry_n_parents(fields, _int=int):
2071
path_name_file_id_key = (fields[0], fields[1], fields[2])
2072
trees = [(fields[cur], # minikind
2073
fields[cur+1], # fingerprint
2074
_int(fields[cur+2]), # size
2075
fields[cur+3] == 'y', # executable
2076
fields[cur+4], # stat or revision_id
2077
) for cur in xrange(3, len(fields)-1, 5)]
2078
return path_name_file_id_key, trees
2079
return fields_to_entry_n_parents
2081
def get_parent_ids(self):
2082
"""Return a list of the parent tree ids for the directory state."""
2083
self._read_header_if_needed()
2084
return list(self._parents)
2086
def _get_block_entry_index(self, dirname, basename, tree_index):
2087
"""Get the coordinates for a path in the state structure.
2089
:param dirname: The utf8 dirname to lookup.
2090
:param basename: The utf8 basename to lookup.
2091
:param tree_index: The index of the tree for which this lookup should
2093
:return: A tuple describing where the path is located, or should be
2094
inserted. The tuple contains four fields: the block index, the row
2095
index, the directory is present (boolean), the entire path is
2096
present (boolean). There is no guarantee that either
2097
coordinate is currently reachable unless the found field for it is
2098
True. For instance, a directory not present in the searched tree
2099
may be returned with a value one greater than the current highest
2100
block offset. The directory present field will always be True when
2101
the path present field is True. The directory present field does
2102
NOT indicate that the directory is present in the searched tree,
2103
rather it indicates that there are at least some files in some
2106
self._read_dirblocks_if_needed()
2107
key = dirname, basename, ''
2108
block_index, present = self._find_block_index_from_key(key)
2110
# no such directory - return the dir index and 0 for the row.
2111
return block_index, 0, False, False
2112
block = self._dirblocks[block_index][1] # access the entries only
2113
entry_index, present = self._find_entry_index(key, block)
2114
# linear search through entries at this path to find the one
2116
while entry_index < len(block) and block[entry_index][0][1] == basename:
2117
if block[entry_index][1][tree_index][0] not in 'ar':
2118
# neither absent or relocated
2119
return block_index, entry_index, True, True
2121
return block_index, entry_index, True, False
2123
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None,
2124
include_deleted=False):
2125
"""Get the dirstate entry for path in tree tree_index.
2127
If either file_id or path is supplied, it is used as the key to lookup.
2128
If both are supplied, the fastest lookup is used, and an error is
2129
raised if they do not both point at the same row.
2131
:param tree_index: The index of the tree we wish to locate this path
2132
in. If the path is present in that tree, the entry containing its
2133
details is returned, otherwise (None, None) is returned
2134
0 is the working tree, higher indexes are successive parent
2136
:param fileid_utf8: A utf8 file_id to look up.
2137
:param path_utf8: An utf8 path to be looked up.
2138
:param include_deleted: If True, and performing a lookup via
2139
fileid_utf8 rather than path_utf8, return an entry for deleted
2141
:return: The dirstate entry tuple for path, or (None, None)
2143
self._read_dirblocks_if_needed()
2144
if path_utf8 is not None:
2145
if type(path_utf8) is not str:
2146
raise errors.BzrError('path_utf8 is not a str: %s %r'
2147
% (type(path_utf8), path_utf8))
2148
# path lookups are faster
2149
dirname, basename = osutils.split(path_utf8)
2150
block_index, entry_index, dir_present, file_present = \
2151
self._get_block_entry_index(dirname, basename, tree_index)
2152
if not file_present:
2154
entry = self._dirblocks[block_index][1][entry_index]
2155
if not (entry[0][2] and entry[1][tree_index][0] not in ('a', 'r')):
2156
raise AssertionError('unversioned entry?')
2158
if entry[0][2] != fileid_utf8:
2159
self._changes_aborted = True
2160
raise errors.BzrError('integrity error ? : mismatching'
2161
' tree_index, file_id and path')
2164
possible_keys = self._get_id_index().get(fileid_utf8, ())
2165
if not possible_keys:
2167
for key in possible_keys:
2168
block_index, present = \
2169
self._find_block_index_from_key(key)
2170
# strange, probably indicates an out of date
2171
# id index - for now, allow this.
2174
# WARNING: DO not change this code to use _get_block_entry_index
2175
# as that function is not suitable: it does not use the key
2176
# to lookup, and thus the wrong coordinates are returned.
2177
block = self._dirblocks[block_index][1]
2178
entry_index, present = self._find_entry_index(key, block)
2180
entry = self._dirblocks[block_index][1][entry_index]
2181
# TODO: We might want to assert that entry[0][2] ==
2183
if entry[1][tree_index][0] in 'fdlt':
2184
# this is the result we are looking for: the
2185
# real home of this file_id in this tree.
2187
if entry[1][tree_index][0] == 'a':
2188
# there is no home for this entry in this tree
2192
if entry[1][tree_index][0] != 'r':
2193
raise AssertionError(
2194
"entry %r has invalid minikind %r for tree %r" \
2196
entry[1][tree_index][0],
2198
real_path = entry[1][tree_index][1]
2199
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
2200
path_utf8=real_path)
2204
def initialize(cls, path, sha1_provider=None):
2205
"""Create a new dirstate on path.
2207
The new dirstate will be an empty tree - that is it has no parents,
2208
and only a root node - which has id ROOT_ID.
2210
:param path: The name of the file for the dirstate.
2211
:param sha1_provider: an object meeting the SHA1Provider interface.
2212
If None, a DefaultSHA1Provider is used.
2213
:return: A write-locked DirState object.
2215
# This constructs a new DirState object on a path, sets the _state_file
2216
# to a new empty file for that path. It then calls _set_data() with our
2217
# stock empty dirstate information - a root with ROOT_ID, no children,
2218
# and no parents. Finally it calls save() to ensure that this data will
2220
if sha1_provider is None:
2221
sha1_provider = DefaultSHA1Provider()
2222
result = cls(path, sha1_provider)
2223
# root dir and root dir contents with no children.
2224
empty_tree_dirblocks = [('', []), ('', [])]
2225
# a new root directory, with a NULLSTAT.
2226
empty_tree_dirblocks[0][1].append(
2227
(('', '', inventory.ROOT_ID), [
2228
('d', '', 0, False, DirState.NULLSTAT),
2232
result._set_data([], empty_tree_dirblocks)
2240
def _inv_entry_to_details(inv_entry):
2241
"""Convert an inventory entry (from a revision tree) to state details.
2243
:param inv_entry: An inventory entry whose sha1 and link targets can be
2244
relied upon, and which has a revision set.
2245
:return: A details tuple - the details for a single tree at a path +
2248
kind = inv_entry.kind
2249
minikind = DirState._kind_to_minikind[kind]
2250
tree_data = inv_entry.revision
2251
if kind == 'directory':
2255
elif kind == 'symlink':
2256
if inv_entry.symlink_target is None:
2259
fingerprint = inv_entry.symlink_target.encode('utf8')
2262
elif kind == 'file':
2263
fingerprint = inv_entry.text_sha1 or ''
2264
size = inv_entry.text_size or 0
2265
executable = inv_entry.executable
2266
elif kind == 'tree-reference':
2267
fingerprint = inv_entry.reference_revision or ''
2271
raise Exception("can't pack %s" % inv_entry)
2272
return static_tuple.StaticTuple(minikind, fingerprint, size,
2273
executable, tree_data)
2275
def _iter_child_entries(self, tree_index, path_utf8):
2276
"""Iterate over all the entries that are children of path_utf.
2278
This only returns entries that are present (not in 'a', 'r') in
2279
tree_index. tree_index data is not refreshed, so if tree 0 is used,
2280
results may differ from that obtained if paths were statted to
2281
determine what ones were directories.
2283
Asking for the children of a non-directory will return an empty
2287
next_pending_dirs = [path_utf8]
2289
while next_pending_dirs:
2290
pending_dirs = next_pending_dirs
2291
next_pending_dirs = []
2292
for path in pending_dirs:
2293
block_index, present = self._find_block_index_from_key(
2295
if block_index == 0:
2297
if len(self._dirblocks) == 1:
2298
# asked for the children of the root with no other
2302
# children of a non-directory asked for.
2304
block = self._dirblocks[block_index]
2305
for entry in block[1]:
2306
kind = entry[1][tree_index][0]
2307
if kind not in absent:
2311
path = entry[0][0] + '/' + entry[0][1]
2314
next_pending_dirs.append(path)
2316
def _iter_entries(self):
2317
"""Iterate over all the entries in the dirstate.
2319
Each yelt item is an entry in the standard format described in the
2320
docstring of bzrlib.dirstate.
2322
self._read_dirblocks_if_needed()
2323
for directory in self._dirblocks:
2324
for entry in directory[1]:
2327
def _get_id_index(self):
2328
"""Get an id index of self._dirblocks.
2330
This maps from file_id => [(directory, name, file_id)] entries where
2331
that file_id appears in one of the trees.
2333
if self._id_index is None:
2335
for key, tree_details in self._iter_entries():
2336
self._add_to_id_index(id_index, key)
2337
self._id_index = id_index
2338
return self._id_index
2340
def _add_to_id_index(self, id_index, entry_key):
2341
"""Add this entry to the _id_index mapping."""
2342
# This code used to use a set for every entry in the id_index. However,
2343
# it is *rare* to have more than one entry. So a set is a large
2344
# overkill. And even when we do, we won't ever have more than the
2345
# number of parent trees. Which is still a small number (rarely >2). As
2346
# such, we use a simple tuple, and do our own uniqueness checks. While
2347
# the 'in' check is O(N) since N is nicely bounded it shouldn't ever
2348
# cause quadratic failure.
2349
file_id = entry_key[2]
2350
entry_key = static_tuple.StaticTuple.from_sequence(entry_key)
2351
if file_id not in id_index:
2352
id_index[file_id] = static_tuple.StaticTuple(entry_key,)
2354
entry_keys = id_index[file_id]
2355
if entry_key not in entry_keys:
2356
id_index[file_id] = entry_keys + (entry_key,)
2358
def _remove_from_id_index(self, id_index, entry_key):
2359
"""Remove this entry from the _id_index mapping.
2361
It is an programming error to call this when the entry_key is not
2364
file_id = entry_key[2]
2365
entry_keys = list(id_index[file_id])
2366
entry_keys.remove(entry_key)
2367
id_index[file_id] = static_tuple.StaticTuple.from_sequence(entry_keys)
2369
def _get_output_lines(self, lines):
2370
"""Format lines for final output.
2372
:param lines: A sequence of lines containing the parents list and the
2375
output_lines = [DirState.HEADER_FORMAT_3]
2376
lines.append('') # a final newline
2377
inventory_text = '\0\n\0'.join(lines)
2378
output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
2379
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
2380
num_entries = len(lines)-3
2381
output_lines.append('num_entries: %s\n' % (num_entries,))
2382
output_lines.append(inventory_text)
2385
def _make_deleted_row(self, fileid_utf8, parents):
2386
"""Return a deleted row for fileid_utf8."""
2387
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
2390
def _num_present_parents(self):
2391
"""The number of parent entries in each record row."""
2392
return len(self._parents) - len(self._ghosts)
2395
def on_file(cls, path, sha1_provider=None, worth_saving_limit=0):
2396
"""Construct a DirState on the file at path "path".
2398
:param path: The path at which the dirstate file on disk should live.
2399
:param sha1_provider: an object meeting the SHA1Provider interface.
2400
If None, a DefaultSHA1Provider is used.
2401
:param worth_saving_limit: when the exact number of hash changed
2402
entries is known, only bother saving the dirstate if more than
2403
this count of entries have changed. -1 means never save.
2404
:return: An unlocked DirState object, associated with the given path.
2406
if sha1_provider is None:
2407
sha1_provider = DefaultSHA1Provider()
2408
result = cls(path, sha1_provider,
2409
worth_saving_limit=worth_saving_limit)
2412
def _read_dirblocks_if_needed(self):
2413
"""Read in all the dirblocks from the file if they are not in memory.
2415
This populates self._dirblocks, and sets self._dirblock_state to
2416
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
2419
self._read_header_if_needed()
2420
if self._dirblock_state == DirState.NOT_IN_MEMORY:
2421
_read_dirblocks(self)
2423
def _read_header(self):
2424
"""This reads in the metadata header, and the parent ids.
2426
After reading in, the file should be positioned at the null
2427
just before the start of the first record in the file.
2429
:return: (expected crc checksum, number of entries, parent list)
2431
self._read_prelude()
2432
parent_line = self._state_file.readline()
2433
info = parent_line.split('\0')
2434
num_parents = int(info[0])
2435
self._parents = info[1:-1]
2436
ghost_line = self._state_file.readline()
2437
info = ghost_line.split('\0')
2438
num_ghosts = int(info[1])
2439
self._ghosts = info[2:-1]
2440
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2441
self._end_of_header = self._state_file.tell()
2443
def _read_header_if_needed(self):
2444
"""Read the header of the dirstate file if needed."""
2445
# inline this as it will be called a lot
2446
if not self._lock_token:
2447
raise errors.ObjectNotLocked(self)
2448
if self._header_state == DirState.NOT_IN_MEMORY:
2451
def _read_prelude(self):
2452
"""Read in the prelude header of the dirstate file.
2454
This only reads in the stuff that is not connected to the crc
2455
checksum. The position will be correct to read in the rest of
2456
the file and check the checksum after this point.
2457
The next entry in the file should be the number of parents,
2458
and their ids. Followed by a newline.
2460
header = self._state_file.readline()
2461
if header != DirState.HEADER_FORMAT_3:
2462
raise errors.BzrError(
2463
'invalid header line: %r' % (header,))
2464
crc_line = self._state_file.readline()
2465
if not crc_line.startswith('crc32: '):
2466
raise errors.BzrError('missing crc32 checksum: %r' % crc_line)
2467
self.crc_expected = int(crc_line[len('crc32: '):-1])
2468
num_entries_line = self._state_file.readline()
2469
if not num_entries_line.startswith('num_entries: '):
2470
raise errors.BzrError('missing num_entries line')
2471
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
2473
def sha1_from_stat(self, path, stat_result, _pack_stat=pack_stat):
2474
"""Find a sha1 given a stat lookup."""
2475
return self._get_packed_stat_index().get(_pack_stat(stat_result), None)
2477
def _get_packed_stat_index(self):
2478
"""Get a packed_stat index of self._dirblocks."""
2479
if self._packed_stat_index is None:
2481
for key, tree_details in self._iter_entries():
2482
if tree_details[0][0] == 'f':
2483
index[tree_details[0][4]] = tree_details[0][1]
2484
self._packed_stat_index = index
2485
return self._packed_stat_index
2488
"""Save any pending changes created during this session.
2490
We reuse the existing file, because that prevents race conditions with
2491
file creation, and use oslocks on it to prevent concurrent modification
2492
and reads - because dirstate's incremental data aggregation is not
2493
compatible with reading a modified file, and replacing a file in use by
2494
another process is impossible on Windows.
2496
A dirstate in read only mode should be smart enough though to validate
2497
that the file has not changed, and otherwise discard its cache and
2498
start over, to allow for fine grained read lock duration, so 'status'
2499
wont block 'commit' - for example.
2501
if self._changes_aborted:
2502
# Should this be a warning? For now, I'm expecting that places that
2503
# mark it inconsistent will warn, making a warning here redundant.
2504
trace.mutter('Not saving DirState because '
2505
'_changes_aborted is set.')
2507
# TODO: Since we now distinguish IN_MEMORY_MODIFIED from
2508
# IN_MEMORY_HASH_MODIFIED, we should only fail quietly if we fail
2509
# to save an IN_MEMORY_HASH_MODIFIED, and fail *noisily* if we
2510
# fail to save IN_MEMORY_MODIFIED
2511
if self._worth_saving():
2512
grabbed_write_lock = False
2513
if self._lock_state != 'w':
2514
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2515
# Switch over to the new lock, as the old one may be closed.
2516
# TODO: jam 20070315 We should validate the disk file has
2517
# not changed contents. Since temporary_write_lock may
2518
# not be an atomic operation.
2519
self._lock_token = new_lock
2520
self._state_file = new_lock.f
2521
if not grabbed_write_lock:
2522
# We couldn't grab a write lock, so we switch back to a read one
2525
lines = self.get_lines()
2526
self._state_file.seek(0)
2527
self._state_file.writelines(lines)
2528
self._state_file.truncate()
2529
self._state_file.flush()
2530
self._mark_unmodified()
2532
if grabbed_write_lock:
2533
self._lock_token = self._lock_token.restore_read_lock()
2534
self._state_file = self._lock_token.f
2535
# TODO: jam 20070315 We should validate the disk file has
2536
# not changed contents. Since restore_read_lock may
2537
# not be an atomic operation.
2539
def _worth_saving(self):
2540
"""Is it worth saving the dirstate or not?"""
2541
if (self._header_state == DirState.IN_MEMORY_MODIFIED
2542
or self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2544
if self._dirblock_state == DirState.IN_MEMORY_HASH_MODIFIED:
2545
if self._worth_saving_limit == -1:
2546
# We never save hash changes when the limit is -1
2548
# If we're using smart saving and only a small number of
2549
# entries have changed their hash, don't bother saving. John has
2550
# suggested using a heuristic here based on the size of the
2551
# changed files and/or tree. For now, we go with a configurable
2552
# number of changes, keeping the calculation time
2553
# as low overhead as possible. (This also keeps all existing
2554
# tests passing as the default is 0, i.e. always save.)
2555
if len(self._known_hash_changes) >= self._worth_saving_limit:
2559
def _set_data(self, parent_ids, dirblocks):
2560
"""Set the full dirstate data in memory.
2562
This is an internal function used to completely replace the objects
2563
in memory state. It puts the dirstate into state 'full-dirty'.
2565
:param parent_ids: A list of parent tree revision ids.
2566
:param dirblocks: A list containing one tuple for each directory in the
2567
tree. Each tuple contains the directory path and a list of entries
2568
found in that directory.
2570
# our memory copy is now authoritative.
2571
self._dirblocks = dirblocks
2572
self._mark_modified(header_modified=True)
2573
self._parents = list(parent_ids)
2574
self._id_index = None
2575
self._packed_stat_index = None
2577
def set_path_id(self, path, new_id):
2578
"""Change the id of path to new_id in the current working tree.
2580
:param path: The path inside the tree to set - '' is the root, 'foo'
2581
is the path foo in the root.
2582
:param new_id: The new id to assign to the path. This must be a utf8
2583
file id (not unicode, and not None).
2585
self._read_dirblocks_if_needed()
2587
# TODO: logic not written
2588
raise NotImplementedError(self.set_path_id)
2589
# TODO: check new id is unique
2590
entry = self._get_entry(0, path_utf8=path)
2591
if entry[0][2] == new_id:
2592
# Nothing to change.
2594
# mark the old path absent, and insert a new root path
2595
self._make_absent(entry)
2596
self.update_minimal(('', '', new_id), 'd',
2597
path_utf8='', packed_stat=entry[1][0][4])
2598
self._mark_modified()
2599
# XXX: This was added by Ian, we need to make sure there
2600
# are tests for it, because it isn't in bzr.dev TRUNK
2601
# It looks like the only place it is called is in setting the root
2602
# id of the tree. So probably we never had an _id_index when we
2603
# don't even have a root yet.
2604
if self._id_index is not None:
2605
self._add_to_id_index(self._id_index, entry[0])
2607
def set_parent_trees(self, trees, ghosts):
2608
"""Set the parent trees for the dirstate.
2610
:param trees: A list of revision_id, tree tuples. tree must be provided
2611
even if the revision_id refers to a ghost: supply an empty tree in
2613
:param ghosts: A list of the revision_ids that are ghosts at the time
2616
# TODO: generate a list of parent indexes to preserve to save
2617
# processing specific parent trees. In the common case one tree will
2618
# be preserved - the left most parent.
2619
# TODO: if the parent tree is a dirstate, we might want to walk them
2620
# all by path in parallel for 'optimal' common-case performance.
2621
# generate new root row.
2622
self._read_dirblocks_if_needed()
2623
# TODO future sketch: Examine the existing parents to generate a change
2624
# map and then walk the new parent trees only, mapping them into the
2625
# dirstate. Walk the dirstate at the same time to remove unreferenced
2628
# sketch: loop over all entries in the dirstate, cherry picking
2629
# entries from the parent trees, if they are not ghost trees.
2630
# after we finish walking the dirstate, all entries not in the dirstate
2631
# are deletes, so we want to append them to the end as per the design
2632
# discussions. So do a set difference on ids with the parents to
2633
# get deletes, and add them to the end.
2634
# During the update process we need to answer the following questions:
2635
# - find other keys containing a fileid in order to create cross-path
2636
# links. We dont't trivially use the inventory from other trees
2637
# because this leads to either double touching, or to accessing
2639
# - find other keys containing a path
2640
# We accumulate each entry via this dictionary, including the root
2643
# we could do parallel iterators, but because file id data may be
2644
# scattered throughout, we dont save on index overhead: we have to look
2645
# at everything anyway. We can probably save cycles by reusing parent
2646
# data and doing an incremental update when adding an additional
2647
# parent, but for now the common cases are adding a new parent (merge),
2648
# and replacing completely (commit), and commit is more common: so
2649
# optimise merge later.
2651
# ---- start generation of full tree mapping data
2652
# what trees should we use?
2653
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
2654
# how many trees do we end up with
2655
parent_count = len(parent_trees)
2656
st = static_tuple.StaticTuple
2658
# one: the current tree
2659
for entry in self._iter_entries():
2660
# skip entries not in the current tree
2661
if entry[1][0][0] in 'ar': # absent, relocated
2663
by_path[entry[0]] = [entry[1][0]] + \
2664
[DirState.NULL_PARENT_DETAILS] * parent_count
2665
# TODO: Possibly inline this, since we know it isn't present yet
2666
# id_index[entry[0][2]] = (entry[0],)
2667
self._add_to_id_index(id_index, entry[0])
2669
# now the parent trees:
2670
for tree_index, tree in enumerate(parent_trees):
2671
# the index is off by one, adjust it.
2672
tree_index = tree_index + 1
2673
# when we add new locations for a fileid we need these ranges for
2674
# any fileid in this tree as we set the by_path[id] to:
2675
# already_processed_tree_details + new_details + new_location_suffix
2676
# the suffix is from tree_index+1:parent_count+1.
2677
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
2678
# now stitch in all the entries from this tree
2680
for path, entry in tree.iter_entries_by_dir():
2681
# here we process each trees details for each item in the tree.
2682
# we first update any existing entries for the id at other paths,
2683
# then we either create or update the entry for the id at the
2684
# right path, and finally we add (if needed) a mapping from
2685
# file_id to this path. We do it in this order to allow us to
2686
# avoid checking all known paths for the id when generating a
2687
# new entry at this path: by adding the id->path mapping last,
2688
# all the mappings are valid and have correct relocation
2689
# records where needed.
2690
file_id = entry.file_id
2691
path_utf8 = path.encode('utf8')
2692
dirname, basename = osutils.split(path_utf8)
2693
if dirname == last_dirname:
2694
# Try to re-use objects as much as possible
2695
dirname = last_dirname
2697
last_dirname = dirname
2698
new_entry_key = st(dirname, basename, file_id)
2699
# tree index consistency: All other paths for this id in this tree
2700
# index must point to the correct path.
2701
entry_keys = id_index.get(file_id, ())
2702
for entry_key in entry_keys:
2703
# TODO:PROFILING: It might be faster to just update
2704
# rather than checking if we need to, and then overwrite
2705
# the one we are located at.
2706
if entry_key != new_entry_key:
2707
# this file id is at a different path in one of the
2708
# other trees, so put absent pointers there
2709
# This is the vertical axis in the matrix, all pointing
2711
by_path[entry_key][tree_index] = st('r', path_utf8, 0,
2713
# by path consistency: Insert into an existing path record
2714
# (trivial), or add a new one with relocation pointers for the
2715
# other tree indexes.
2716
if new_entry_key in entry_keys:
2717
# there is already an entry where this data belongs, just
2719
by_path[new_entry_key][tree_index] = \
2720
self._inv_entry_to_details(entry)
2722
# add relocated entries to the horizontal axis - this row
2723
# mapping from path,id. We need to look up the correct path
2724
# for the indexes from 0 to tree_index -1
2726
for lookup_index in xrange(tree_index):
2727
# boundary case: this is the first occurence of file_id
2728
# so there are no id_indexes, possibly take this out of
2730
if not len(entry_keys):
2731
new_details.append(DirState.NULL_PARENT_DETAILS)
2733
# grab any one entry, use it to find the right path.
2734
a_key = iter(entry_keys).next()
2735
if by_path[a_key][lookup_index][0] in ('r', 'a'):
2736
# its a pointer or missing statement, use it as
2738
new_details.append(by_path[a_key][lookup_index])
2740
# we have the right key, make a pointer to it.
2741
real_path = ('/'.join(a_key[0:2])).strip('/')
2742
new_details.append(st('r', real_path, 0, False,
2744
new_details.append(self._inv_entry_to_details(entry))
2745
new_details.extend(new_location_suffix)
2746
by_path[new_entry_key] = new_details
2747
self._add_to_id_index(id_index, new_entry_key)
2748
# --- end generation of full tree mappings
2750
# sort and output all the entries
2751
new_entries = self._sort_entries(by_path.items())
2752
self._entries_to_current_state(new_entries)
2753
self._parents = [rev_id for rev_id, tree in trees]
2754
self._ghosts = list(ghosts)
2755
self._mark_modified(header_modified=True)
2756
self._id_index = id_index
2758
def _sort_entries(self, entry_list):
2759
"""Given a list of entries, sort them into the right order.
2761
This is done when constructing a new dirstate from trees - normally we
2762
try to keep everything in sorted blocks all the time, but sometimes
2763
it's easier to sort after the fact.
2765
# When sorting, we usually have 10x more entries than directories. (69k
2766
# total entries, 4k directories). So cache the results of splitting.
2767
# Saving time and objects. Also, use StaticTuple to avoid putting all
2768
# of these object into python's garbage collector.
2770
def _key(entry, _split_dirs=split_dirs, _st=static_tuple.StaticTuple):
2771
# sort by: directory parts, file name, file id
2772
dirpath, fname, file_id = entry[0]
2774
split = _split_dirs[dirpath]
2776
split = _st.from_sequence(dirpath.split('/'))
2777
_split_dirs[dirpath] = split
2778
return _st(split, fname, file_id)
2779
return sorted(entry_list, key=_key)
2781
def set_state_from_inventory(self, new_inv):
2782
"""Set new_inv as the current state.
2784
This API is called by tree transform, and will usually occur with
2785
existing parent trees.
2787
:param new_inv: The inventory object to set current state from.
2789
if 'evil' in debug.debug_flags:
2790
trace.mutter_callsite(1,
2791
"set_state_from_inventory called; please mutate the tree instead")
2792
tracing = 'dirstate' in debug.debug_flags
2794
trace.mutter("set_state_from_inventory trace:")
2795
self._read_dirblocks_if_needed()
2797
# Two iterators: current data and new data, both in dirblock order.
2798
# We zip them together, which tells about entries that are new in the
2799
# inventory, or removed in the inventory, or present in both and
2802
# You might think we could just synthesize a new dirstate directly
2803
# since we're processing it in the right order. However, we need to
2804
# also consider there may be any number of parent trees and relocation
2805
# pointers, and we don't want to duplicate that here.
2806
new_iterator = new_inv.iter_entries_by_dir()
2807
# we will be modifying the dirstate, so we need a stable iterator. In
2808
# future we might write one, for now we just clone the state into a
2809
# list using a copy so that we see every original item and don't have
2810
# to adjust the position when items are inserted or deleted in the
2811
# underlying dirstate.
2812
old_iterator = iter(list(self._iter_entries()))
2813
# both must have roots so this is safe:
2814
current_new = new_iterator.next()
2815
current_old = old_iterator.next()
2816
def advance(iterator):
2818
return iterator.next()
2819
except StopIteration:
2821
while current_new or current_old:
2822
# skip entries in old that are not really there
2823
if current_old and current_old[1][0][0] in 'ar':
2824
# relocated or absent
2825
current_old = advance(old_iterator)
2828
# convert new into dirblock style
2829
new_path_utf8 = current_new[0].encode('utf8')
2830
new_dirname, new_basename = osutils.split(new_path_utf8)
2831
new_id = current_new[1].file_id
2832
new_entry_key = (new_dirname, new_basename, new_id)
2833
current_new_minikind = \
2834
DirState._kind_to_minikind[current_new[1].kind]
2835
if current_new_minikind == 't':
2836
fingerprint = current_new[1].reference_revision or ''
2838
# We normally only insert or remove records, or update
2839
# them when it has significantly changed. Then we want to
2840
# erase its fingerprint. Unaffected records should
2841
# normally not be updated at all.
2844
# for safety disable variables
2845
new_path_utf8 = new_dirname = new_basename = new_id = \
2846
new_entry_key = None
2847
# 5 cases, we dont have a value that is strictly greater than everything, so
2848
# we make both end conditions explicit
2850
# old is finished: insert current_new into the state.
2852
trace.mutter("Appending from new '%s'.",
2853
new_path_utf8.decode('utf8'))
2854
self.update_minimal(new_entry_key, current_new_minikind,
2855
executable=current_new[1].executable,
2856
path_utf8=new_path_utf8, fingerprint=fingerprint,
2858
current_new = advance(new_iterator)
2859
elif not current_new:
2862
trace.mutter("Truncating from old '%s/%s'.",
2863
current_old[0][0].decode('utf8'),
2864
current_old[0][1].decode('utf8'))
2865
self._make_absent(current_old)
2866
current_old = advance(old_iterator)
2867
elif new_entry_key == current_old[0]:
2868
# same - common case
2869
# We're looking at the same path and id in both the dirstate
2870
# and inventory, so just need to update the fields in the
2871
# dirstate from the one in the inventory.
2872
# TODO: update the record if anything significant has changed.
2873
# the minimal required trigger is if the execute bit or cached
2875
if (current_old[1][0][3] != current_new[1].executable or
2876
current_old[1][0][0] != current_new_minikind):
2878
trace.mutter("Updating in-place change '%s'.",
2879
new_path_utf8.decode('utf8'))
2880
self.update_minimal(current_old[0], current_new_minikind,
2881
executable=current_new[1].executable,
2882
path_utf8=new_path_utf8, fingerprint=fingerprint,
2884
# both sides are dealt with, move on
2885
current_old = advance(old_iterator)
2886
current_new = advance(new_iterator)
2887
elif (cmp_by_dirs(new_dirname, current_old[0][0]) < 0
2888
or (new_dirname == current_old[0][0]
2889
and new_entry_key[1:] < current_old[0][1:])):
2891
# add a entry for this and advance new
2893
trace.mutter("Inserting from new '%s'.",
2894
new_path_utf8.decode('utf8'))
2895
self.update_minimal(new_entry_key, current_new_minikind,
2896
executable=current_new[1].executable,
2897
path_utf8=new_path_utf8, fingerprint=fingerprint,
2899
current_new = advance(new_iterator)
2901
# we've advanced past the place where the old key would be,
2902
# without seeing it in the new list. so it must be gone.
2904
trace.mutter("Deleting from old '%s/%s'.",
2905
current_old[0][0].decode('utf8'),
2906
current_old[0][1].decode('utf8'))
2907
self._make_absent(current_old)
2908
current_old = advance(old_iterator)
2909
self._mark_modified()
2910
self._id_index = None
2911
self._packed_stat_index = None
2913
trace.mutter("set_state_from_inventory complete.")
2915
def set_state_from_scratch(self, working_inv, parent_trees, parent_ghosts):
2916
"""Wipe the currently stored state and set it to something new.
2918
This is a hard-reset for the data we are working with.
2920
# Technically, we really want a write lock, but until we write, we
2921
# don't really need it.
2922
self._requires_lock()
2923
# root dir and root dir contents with no children. We have to have a
2924
# root for set_state_from_inventory to work correctly.
2925
empty_root = (('', '', inventory.ROOT_ID),
2926
[('d', '', 0, False, DirState.NULLSTAT)])
2927
empty_tree_dirblocks = [('', [empty_root]), ('', [])]
2928
self._set_data([], empty_tree_dirblocks)
2929
self.set_state_from_inventory(working_inv)
2930
self.set_parent_trees(parent_trees, parent_ghosts)
2932
def _make_absent(self, current_old):
2933
"""Mark current_old - an entry - as absent for tree 0.
2935
:return: True if this was the last details entry for the entry key:
2936
that is, if the underlying block has had the entry removed, thus
2937
shrinking in length.
2939
# build up paths that this id will be left at after the change is made,
2940
# so we can update their cross references in tree 0
2941
all_remaining_keys = set()
2942
# Dont check the working tree, because it's going.
2943
for details in current_old[1][1:]:
2944
if details[0] not in 'ar': # absent, relocated
2945
all_remaining_keys.add(current_old[0])
2946
elif details[0] == 'r': # relocated
2947
# record the key for the real path.
2948
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2949
# absent rows are not present at any path.
2950
last_reference = current_old[0] not in all_remaining_keys
2952
# the current row consists entire of the current item (being marked
2953
# absent), and relocated or absent entries for the other trees:
2954
# Remove it, its meaningless.
2955
block = self._find_block(current_old[0])
2956
entry_index, present = self._find_entry_index(current_old[0], block[1])
2958
raise AssertionError('could not find entry for %s' % (current_old,))
2959
block[1].pop(entry_index)
2960
# if we have an id_index in use, remove this key from it for this id.
2961
if self._id_index is not None:
2962
self._remove_from_id_index(self._id_index, current_old[0])
2963
# update all remaining keys for this id to record it as absent. The
2964
# existing details may either be the record we are marking as deleted
2965
# (if there were other trees with the id present at this path), or may
2967
for update_key in all_remaining_keys:
2968
update_block_index, present = \
2969
self._find_block_index_from_key(update_key)
2971
raise AssertionError('could not find block for %s' % (update_key,))
2972
update_entry_index, present = \
2973
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2975
raise AssertionError('could not find entry for %s' % (update_key,))
2976
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2977
# it must not be absent at the moment
2978
if update_tree_details[0][0] == 'a': # absent
2979
raise AssertionError('bad row %r' % (update_tree_details,))
2980
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2981
self._mark_modified()
2982
return last_reference
2984
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2985
packed_stat=None, size=0, path_utf8=None, fullscan=False):
2986
"""Update an entry to the state in tree 0.
2988
This will either create a new entry at 'key' or update an existing one.
2989
It also makes sure that any other records which might mention this are
2992
:param key: (dir, name, file_id) for the new entry
2993
:param minikind: The type for the entry ('f' == 'file', 'd' ==
2995
:param executable: Should the executable bit be set?
2996
:param fingerprint: Simple fingerprint for new entry: canonical-form
2997
sha1 for files, referenced revision id for subtrees, etc.
2998
:param packed_stat: Packed stat value for new entry.
2999
:param size: Size information for new entry
3000
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
3002
:param fullscan: If True then a complete scan of the dirstate is being
3003
done and checking for duplicate rows should not be done. This
3004
should only be set by set_state_from_inventory and similar methods.
3006
If packed_stat and fingerprint are not given, they're invalidated in
3009
block = self._find_block(key)[1]
3010
if packed_stat is None:
3011
packed_stat = DirState.NULLSTAT
3012
# XXX: Some callers pass '' as the packed_stat, and it seems to be
3013
# sometimes present in the dirstate - this seems oddly inconsistent.
3015
entry_index, present = self._find_entry_index(key, block)
3016
new_details = (minikind, fingerprint, size, executable, packed_stat)
3017
id_index = self._get_id_index()
3019
# New record. Check there isn't a entry at this path already.
3021
low_index, _ = self._find_entry_index(key[0:2] + ('',), block)
3022
while low_index < len(block):
3023
entry = block[low_index]
3024
if entry[0][0:2] == key[0:2]:
3025
if entry[1][0][0] not in 'ar':
3026
# This entry has the same path (but a different id) as
3027
# the new entry we're adding, and is present in ths
3029
self._raise_invalid(
3030
("%s/%s" % key[0:2]).decode('utf8'), key[2],
3031
"Attempt to add item at path already occupied by "
3032
"id %r" % entry[0][2])
3036
# new entry, synthesis cross reference here,
3037
existing_keys = id_index.get(key[2], ())
3038
if not existing_keys:
3039
# not currently in the state, simplest case
3040
new_entry = key, [new_details] + self._empty_parent_info()
3042
# present at one or more existing other paths.
3043
# grab one of them and use it to generate parent
3044
# relocation/absent entries.
3045
new_entry = key, [new_details]
3046
# existing_keys can be changed as we iterate.
3047
for other_key in tuple(existing_keys):
3048
# change the record at other to be a pointer to this new
3049
# record. The loop looks similar to the change to
3050
# relocations when updating an existing record but its not:
3051
# the test for existing kinds is different: this can be
3052
# factored out to a helper though.
3053
other_block_index, present = self._find_block_index_from_key(
3056
raise AssertionError('could not find block for %s' % (
3058
other_block = self._dirblocks[other_block_index][1]
3059
other_entry_index, present = self._find_entry_index(
3060
other_key, other_block)
3062
raise AssertionError(
3063
'update_minimal: could not find other entry for %s'
3065
if path_utf8 is None:
3066
raise AssertionError('no path')
3067
# Turn this other location into a reference to the new
3068
# location. This also updates the aliased iterator
3069
# (current_old in set_state_from_inventory) so that the old
3070
# entry, if not already examined, is skipped over by that
3072
other_entry = other_block[other_entry_index]
3073
other_entry[1][0] = ('r', path_utf8, 0, False, '')
3074
if self._maybe_remove_row(other_block, other_entry_index,
3076
# If the row holding this was removed, we need to
3077
# recompute where this entry goes
3078
entry_index, _ = self._find_entry_index(key, block)
3081
# adds a tuple to the new details for each column
3082
# - either by copying an existing relocation pointer inside that column
3083
# - or by creating a new pointer to the right row inside that column
3084
num_present_parents = self._num_present_parents()
3085
if num_present_parents:
3086
# TODO: This re-evaluates the existing_keys set, do we need
3087
# to do that ourselves?
3088
other_key = list(existing_keys)[0]
3089
for lookup_index in xrange(1, num_present_parents + 1):
3090
# grab any one entry, use it to find the right path.
3091
# TODO: optimise this to reduce memory use in highly
3092
# fragmented situations by reusing the relocation
3094
update_block_index, present = \
3095
self._find_block_index_from_key(other_key)
3097
raise AssertionError('could not find block for %s' % (other_key,))
3098
update_entry_index, present = \
3099
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
3101
raise AssertionError('update_minimal: could not find entry for %s' % (other_key,))
3102
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
3103
if update_details[0] in 'ar': # relocated, absent
3104
# its a pointer or absent in lookup_index's tree, use
3106
new_entry[1].append(update_details)
3108
# we have the right key, make a pointer to it.
3109
pointer_path = osutils.pathjoin(*other_key[0:2])
3110
new_entry[1].append(('r', pointer_path, 0, False, ''))
3111
block.insert(entry_index, new_entry)
3112
self._add_to_id_index(id_index, key)
3114
# Does the new state matter?
3115
block[entry_index][1][0] = new_details
3116
# parents cannot be affected by what we do.
3117
# other occurences of this id can be found
3118
# from the id index.
3120
# tree index consistency: All other paths for this id in this tree
3121
# index must point to the correct path. We have to loop here because
3122
# we may have passed entries in the state with this file id already
3123
# that were absent - where parent entries are - and they need to be
3124
# converted to relocated.
3125
if path_utf8 is None:
3126
raise AssertionError('no path')
3127
existing_keys = id_index.get(key[2], ())
3128
if key not in existing_keys:
3129
raise AssertionError('We found the entry in the blocks, but'
3130
' the key is not in the id_index.'
3131
' key: %s, existing_keys: %s' % (key, existing_keys))
3132
for entry_key in existing_keys:
3133
# TODO:PROFILING: It might be faster to just update
3134
# rather than checking if we need to, and then overwrite
3135
# the one we are located at.
3136
if entry_key != key:
3137
# this file id is at a different path in one of the
3138
# other trees, so put absent pointers there
3139
# This is the vertical axis in the matrix, all pointing
3141
block_index, present = self._find_block_index_from_key(entry_key)
3143
raise AssertionError('not present: %r', entry_key)
3144
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
3146
raise AssertionError('not present: %r', entry_key)
3147
self._dirblocks[block_index][1][entry_index][1][0] = \
3148
('r', path_utf8, 0, False, '')
3149
# add a containing dirblock if needed.
3150
if new_details[0] == 'd':
3151
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
3152
block_index, present = self._find_block_index_from_key(subdir_key)
3154
self._dirblocks.insert(block_index, (subdir_key[0], []))
3156
self._mark_modified()
3158
def _maybe_remove_row(self, block, index, id_index):
3159
"""Remove index if it is absent or relocated across the row.
3161
id_index is updated accordingly.
3162
:return: True if we removed the row, False otherwise
3164
present_in_row = False
3165
entry = block[index]
3166
for column in entry[1]:
3167
if column[0] not in 'ar':
3168
present_in_row = True
3170
if not present_in_row:
3172
self._remove_from_id_index(id_index, entry[0])
3176
def _validate(self):
3177
"""Check that invariants on the dirblock are correct.
3179
This can be useful in debugging; it shouldn't be necessary in
3182
This must be called with a lock held.
3184
# NOTE: This must always raise AssertionError not just assert,
3185
# otherwise it may not behave properly under python -O
3187
# TODO: All entries must have some content that's not 'a' or 'r',
3188
# otherwise it could just be removed.
3190
# TODO: All relocations must point directly to a real entry.
3192
# TODO: No repeated keys.
3195
from pprint import pformat
3196
self._read_dirblocks_if_needed()
3197
if len(self._dirblocks) > 0:
3198
if not self._dirblocks[0][0] == '':
3199
raise AssertionError(
3200
"dirblocks don't start with root block:\n" + \
3201
pformat(self._dirblocks))
3202
if len(self._dirblocks) > 1:
3203
if not self._dirblocks[1][0] == '':
3204
raise AssertionError(
3205
"dirblocks missing root directory:\n" + \
3206
pformat(self._dirblocks))
3207
# the dirblocks are sorted by their path components, name, and dir id
3208
dir_names = [d[0].split('/')
3209
for d in self._dirblocks[1:]]
3210
if dir_names != sorted(dir_names):
3211
raise AssertionError(
3212
"dir names are not in sorted order:\n" + \
3213
pformat(self._dirblocks) + \
3216
for dirblock in self._dirblocks:
3217
# within each dirblock, the entries are sorted by filename and
3219
for entry in dirblock[1]:
3220
if dirblock[0] != entry[0][0]:
3221
raise AssertionError(
3223
"doesn't match directory name in\n%r" %
3224
(entry, pformat(dirblock)))
3225
if dirblock[1] != sorted(dirblock[1]):
3226
raise AssertionError(
3227
"dirblock for %r is not sorted:\n%s" % \
3228
(dirblock[0], pformat(dirblock)))
3230
def check_valid_parent():
3231
"""Check that the current entry has a valid parent.
3233
This makes sure that the parent has a record,
3234
and that the parent isn't marked as "absent" in the
3235
current tree. (It is invalid to have a non-absent file in an absent
3238
if entry[0][0:2] == ('', ''):
3239
# There should be no parent for the root row
3241
parent_entry = self._get_entry(tree_index, path_utf8=entry[0][0])
3242
if parent_entry == (None, None):
3243
raise AssertionError(
3244
"no parent entry for: %s in tree %s"
3245
% (this_path, tree_index))
3246
if parent_entry[1][tree_index][0] != 'd':
3247
raise AssertionError(
3248
"Parent entry for %s is not marked as a valid"
3249
" directory. %s" % (this_path, parent_entry,))
3251
# For each file id, for each tree: either
3252
# the file id is not present at all; all rows with that id in the
3253
# key have it marked as 'absent'
3254
# OR the file id is present under exactly one name; any other entries
3255
# that mention that id point to the correct name.
3257
# We check this with a dict per tree pointing either to the present
3258
# name, or None if absent.
3259
tree_count = self._num_present_parents() + 1
3260
id_path_maps = [dict() for i in range(tree_count)]
3261
# Make sure that all renamed entries point to the correct location.
3262
for entry in self._iter_entries():
3263
file_id = entry[0][2]
3264
this_path = osutils.pathjoin(entry[0][0], entry[0][1])
3265
if len(entry[1]) != tree_count:
3266
raise AssertionError(
3267
"wrong number of entry details for row\n%s" \
3268
",\nexpected %d" % \
3269
(pformat(entry), tree_count))
3270
absent_positions = 0
3271
for tree_index, tree_state in enumerate(entry[1]):
3272
this_tree_map = id_path_maps[tree_index]
3273
minikind = tree_state[0]
3274
if minikind in 'ar':
3275
absent_positions += 1
3276
# have we seen this id before in this column?
3277
if file_id in this_tree_map:
3278
previous_path, previous_loc = this_tree_map[file_id]
3279
# any later mention of this file must be consistent with
3280
# what was said before
3282
if previous_path is not None:
3283
raise AssertionError(
3284
"file %s is absent in row %r but also present " \
3286
(file_id, entry, previous_path))
3287
elif minikind == 'r':
3288
target_location = tree_state[1]
3289
if previous_path != target_location:
3290
raise AssertionError(
3291
"file %s relocation in row %r but also at %r" \
3292
% (file_id, entry, previous_path))
3294
# a file, directory, etc - may have been previously
3295
# pointed to by a relocation, which must point here
3296
if previous_path != this_path:
3297
raise AssertionError(
3298
"entry %r inconsistent with previous path %r "
3300
(entry, previous_path, previous_loc))
3301
check_valid_parent()
3304
# absent; should not occur anywhere else
3305
this_tree_map[file_id] = None, this_path
3306
elif minikind == 'r':
3307
# relocation, must occur at expected location
3308
this_tree_map[file_id] = tree_state[1], this_path
3310
this_tree_map[file_id] = this_path, this_path
3311
check_valid_parent()
3312
if absent_positions == tree_count:
3313
raise AssertionError(
3314
"entry %r has no data for any tree." % (entry,))
3315
if self._id_index is not None:
3316
for file_id, entry_keys in self._id_index.iteritems():
3317
for entry_key in entry_keys:
3318
if entry_key[2] != file_id:
3319
raise AssertionError(
3320
'file_id %r did not match entry key %s'
3321
% (file_id, entry_key))
3322
if len(entry_keys) != len(set(entry_keys)):
3323
raise AssertionError(
3324
'id_index contained non-unique data for %s'
3327
def _wipe_state(self):
3328
"""Forget all state information about the dirstate."""
3329
self._header_state = DirState.NOT_IN_MEMORY
3330
self._dirblock_state = DirState.NOT_IN_MEMORY
3331
self._changes_aborted = False
3334
self._dirblocks = []
3335
self._id_index = None
3336
self._packed_stat_index = None
3337
self._end_of_header = None
3338
self._cutoff_time = None
3339
self._split_path_cache = {}
3341
def lock_read(self):
3342
"""Acquire a read lock on the dirstate."""
3343
if self._lock_token is not None:
3344
raise errors.LockContention(self._lock_token)
3345
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3346
# already in memory, we could read just the header and check for
3347
# any modification. If not modified, we can just leave things
3349
self._lock_token = lock.ReadLock(self._filename)
3350
self._lock_state = 'r'
3351
self._state_file = self._lock_token.f
3354
def lock_write(self):
3355
"""Acquire a write lock on the dirstate."""
3356
if self._lock_token is not None:
3357
raise errors.LockContention(self._lock_token)
3358
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3359
# already in memory, we could read just the header and check for
3360
# any modification. If not modified, we can just leave things
3362
self._lock_token = lock.WriteLock(self._filename)
3363
self._lock_state = 'w'
3364
self._state_file = self._lock_token.f
3368
"""Drop any locks held on the dirstate."""
3369
if self._lock_token is None:
3370
raise errors.LockNotHeld(self)
3371
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
3372
# already in memory, we could read just the header and check for
3373
# any modification. If not modified, we can just leave things
3375
self._state_file = None
3376
self._lock_state = None
3377
self._lock_token.unlock()
3378
self._lock_token = None
3379
self._split_path_cache = {}
3381
def _requires_lock(self):
3382
"""Check that a lock is currently held by someone on the dirstate."""
3383
if not self._lock_token:
3384
raise errors.ObjectNotLocked(self)
3387
def py_update_entry(state, entry, abspath, stat_value,
3388
_stat_to_minikind=DirState._stat_to_minikind,
3389
_pack_stat=pack_stat):
3390
"""Update the entry based on what is actually on disk.
3392
This function only calculates the sha if it needs to - if the entry is
3393
uncachable, or clearly different to the first parent's entry, no sha
3394
is calculated, and None is returned.
3396
:param state: The dirstate this entry is in.
3397
:param entry: This is the dirblock entry for the file in question.
3398
:param abspath: The path on disk for this file.
3399
:param stat_value: The stat value done on the path.
3400
:return: None, or The sha1 hexdigest of the file (40 bytes) or link
3401
target of a symlink.
3404
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
3408
packed_stat = _pack_stat(stat_value)
3409
(saved_minikind, saved_link_or_sha1, saved_file_size,
3410
saved_executable, saved_packed_stat) = entry[1][0]
3412
if minikind == 'd' and saved_minikind == 't':
3414
if (minikind == saved_minikind
3415
and packed_stat == saved_packed_stat):
3416
# The stat hasn't changed since we saved, so we can re-use the
3421
# size should also be in packed_stat
3422
if saved_file_size == stat_value.st_size:
3423
return saved_link_or_sha1
3425
# If we have gotten this far, that means that we need to actually
3426
# process this entry.
3430
executable = state._is_executable(stat_value.st_mode,
3432
if state._cutoff_time is None:
3433
state._sha_cutoff_time()
3434
if (stat_value.st_mtime < state._cutoff_time
3435
and stat_value.st_ctime < state._cutoff_time
3436
and len(entry[1]) > 1
3437
and entry[1][1][0] != 'a'):
3438
# Could check for size changes for further optimised
3439
# avoidance of sha1's. However the most prominent case of
3440
# over-shaing is during initial add, which this catches.
3441
# Besides, if content filtering happens, size and sha
3442
# are calculated at the same time, so checking just the size
3443
# gains nothing w.r.t. performance.
3444
link_or_sha1 = state._sha1_file(abspath)
3445
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
3446
executable, packed_stat)
3448
entry[1][0] = ('f', '', stat_value.st_size,
3449
executable, DirState.NULLSTAT)
3450
worth_saving = False
3451
elif minikind == 'd':
3453
entry[1][0] = ('d', '', 0, False, packed_stat)
3454
if saved_minikind != 'd':
3455
# This changed from something into a directory. Make sure we
3456
# have a directory block for it. This doesn't happen very
3457
# often, so this doesn't have to be super fast.
3458
block_index, entry_index, dir_present, file_present = \
3459
state._get_block_entry_index(entry[0][0], entry[0][1], 0)
3460
state._ensure_block(block_index, entry_index,
3461
osutils.pathjoin(entry[0][0], entry[0][1]))
3463
worth_saving = False
3464
elif minikind == 'l':
3465
if saved_minikind == 'l':
3466
worth_saving = False
3467
link_or_sha1 = state._read_link(abspath, saved_link_or_sha1)
3468
if state._cutoff_time is None:
3469
state._sha_cutoff_time()
3470
if (stat_value.st_mtime < state._cutoff_time
3471
and stat_value.st_ctime < state._cutoff_time):
3472
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
3475
entry[1][0] = ('l', '', stat_value.st_size,
3476
False, DirState.NULLSTAT)
3478
state._mark_modified([entry])
3482
class ProcessEntryPython(object):
3484
__slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id",
3485
"last_source_parent", "last_target_parent", "include_unchanged",
3486
"partial", "use_filesystem_for_exec", "utf8_decode",
3487
"searched_specific_files", "search_specific_files",
3488
"searched_exact_paths", "search_specific_file_parents", "seen_ids",
3489
"state", "source_index", "target_index", "want_unversioned", "tree"]
3491
def __init__(self, include_unchanged, use_filesystem_for_exec,
3492
search_specific_files, state, source_index, target_index,
3493
want_unversioned, tree):
3494
self.old_dirname_to_file_id = {}
3495
self.new_dirname_to_file_id = {}
3496
# Are we doing a partial iter_changes?
3497
self.partial = search_specific_files != set([''])
3498
# Using a list so that we can access the values and change them in
3499
# nested scope. Each one is [path, file_id, entry]
3500
self.last_source_parent = [None, None]
3501
self.last_target_parent = [None, None]
3502
self.include_unchanged = include_unchanged
3503
self.use_filesystem_for_exec = use_filesystem_for_exec
3504
self.utf8_decode = cache_utf8._utf8_decode
3505
# for all search_indexs in each path at or under each element of
3506
# search_specific_files, if the detail is relocated: add the id, and
3507
# add the relocated path as one to search if its not searched already.
3508
# If the detail is not relocated, add the id.
3509
self.searched_specific_files = set()
3510
# When we search exact paths without expanding downwards, we record
3512
self.searched_exact_paths = set()
3513
self.search_specific_files = search_specific_files
3514
# The parents up to the root of the paths we are searching.
3515
# After all normal paths are returned, these specific items are returned.
3516
self.search_specific_file_parents = set()
3517
# The ids we've sent out in the delta.
3518
self.seen_ids = set()
3520
self.source_index = source_index
3521
self.target_index = target_index
3522
if target_index != 0:
3523
# A lot of code in here depends on target_index == 0
3524
raise errors.BzrError('unsupported target index')
3525
self.want_unversioned = want_unversioned
3528
def _process_entry(self, entry, path_info, pathjoin=osutils.pathjoin):
3529
"""Compare an entry and real disk to generate delta information.
3531
:param path_info: top_relpath, basename, kind, lstat, abspath for
3532
the path of entry. If None, then the path is considered absent in
3533
the target (Perhaps we should pass in a concrete entry for this ?)
3534
Basename is returned as a utf8 string because we expect this
3535
tuple will be ignored, and don't want to take the time to
3537
:return: (iter_changes_result, changed). If the entry has not been
3538
handled then changed is None. Otherwise it is False if no content
3539
or metadata changes have occurred, and True if any content or
3540
metadata change has occurred. If self.include_unchanged is True then
3541
if changed is not None, iter_changes_result will always be a result
3542
tuple. Otherwise, iter_changes_result is None unless changed is
3545
if self.source_index is None:
3546
source_details = DirState.NULL_PARENT_DETAILS
3548
source_details = entry[1][self.source_index]
3549
target_details = entry[1][self.target_index]
3550
target_minikind = target_details[0]
3551
if path_info is not None and target_minikind in 'fdlt':
3552
if not (self.target_index == 0):
3553
raise AssertionError()
3554
link_or_sha1 = update_entry(self.state, entry,
3555
abspath=path_info[4], stat_value=path_info[3])
3556
# The entry may have been modified by update_entry
3557
target_details = entry[1][self.target_index]
3558
target_minikind = target_details[0]
3561
file_id = entry[0][2]
3562
source_minikind = source_details[0]
3563
if source_minikind in 'fdltr' and target_minikind in 'fdlt':
3564
# claimed content in both: diff
3565
# r | fdlt | | add source to search, add id path move and perform
3566
# | | | diff check on source-target
3567
# r | fdlt | a | dangling file that was present in the basis.
3569
if source_minikind in 'r':
3570
# add the source to the search path to find any children it
3571
# has. TODO ? : only add if it is a container ?
3572
if not osutils.is_inside_any(self.searched_specific_files,
3574
self.search_specific_files.add(source_details[1])
3575
# generate the old path; this is needed for stating later
3577
old_path = source_details[1]
3578
old_dirname, old_basename = os.path.split(old_path)
3579
path = pathjoin(entry[0][0], entry[0][1])
3580
old_entry = self.state._get_entry(self.source_index,
3582
# update the source details variable to be the real
3584
if old_entry == (None, None):
3585
raise errors.CorruptDirstate(self.state._filename,
3586
"entry '%s/%s' is considered renamed from %r"
3587
" but source does not exist\n"
3588
"entry: %s" % (entry[0][0], entry[0][1], old_path, entry))
3589
source_details = old_entry[1][self.source_index]
3590
source_minikind = source_details[0]
3592
old_dirname = entry[0][0]
3593
old_basename = entry[0][1]
3594
old_path = path = None
3595
if path_info is None:
3596
# the file is missing on disk, show as removed.
3597
content_change = True
3601
# source and target are both versioned and disk file is present.
3602
target_kind = path_info[2]
3603
if target_kind == 'directory':
3605
old_path = path = pathjoin(old_dirname, old_basename)
3606
self.new_dirname_to_file_id[path] = file_id
3607
if source_minikind != 'd':
3608
content_change = True
3610
# directories have no fingerprint
3611
content_change = False
3613
elif target_kind == 'file':
3614
if source_minikind != 'f':
3615
content_change = True
3617
# Check the sha. We can't just rely on the size as
3618
# content filtering may mean differ sizes actually
3619
# map to the same content
3620
if link_or_sha1 is None:
3622
statvalue, link_or_sha1 = \
3623
self.state._sha1_provider.stat_and_sha1(
3625
self.state._observed_sha1(entry, link_or_sha1,
3627
content_change = (link_or_sha1 != source_details[1])
3628
# Target details is updated at update_entry time
3629
if self.use_filesystem_for_exec:
3630
# We don't need S_ISREG here, because we are sure
3631
# we are dealing with a file.
3632
target_exec = bool(stat.S_IEXEC & path_info[3].st_mode)
3634
target_exec = target_details[3]
3635
elif target_kind == 'symlink':
3636
if source_minikind != 'l':
3637
content_change = True
3639
content_change = (link_or_sha1 != source_details[1])
3641
elif target_kind == 'tree-reference':
3642
if source_minikind != 't':
3643
content_change = True
3645
content_change = False
3649
path = pathjoin(old_dirname, old_basename)
3650
raise errors.BadFileKindError(path, path_info[2])
3651
if source_minikind == 'd':
3653
old_path = path = pathjoin(old_dirname, old_basename)
3654
self.old_dirname_to_file_id[old_path] = file_id
3655
# parent id is the entry for the path in the target tree
3656
if old_basename and old_dirname == self.last_source_parent[0]:
3657
source_parent_id = self.last_source_parent[1]
3660
source_parent_id = self.old_dirname_to_file_id[old_dirname]
3662
source_parent_entry = self.state._get_entry(self.source_index,
3663
path_utf8=old_dirname)
3664
source_parent_id = source_parent_entry[0][2]
3665
if source_parent_id == entry[0][2]:
3666
# This is the root, so the parent is None
3667
source_parent_id = None
3669
self.last_source_parent[0] = old_dirname
3670
self.last_source_parent[1] = source_parent_id
3671
new_dirname = entry[0][0]
3672
if entry[0][1] and new_dirname == self.last_target_parent[0]:
3673
target_parent_id = self.last_target_parent[1]
3676
target_parent_id = self.new_dirname_to_file_id[new_dirname]
3678
# TODO: We don't always need to do the lookup, because the
3679
# parent entry will be the same as the source entry.
3680
target_parent_entry = self.state._get_entry(self.target_index,
3681
path_utf8=new_dirname)
3682
if target_parent_entry == (None, None):
3683
raise AssertionError(
3684
"Could not find target parent in wt: %s\nparent of: %s"
3685
% (new_dirname, entry))
3686
target_parent_id = target_parent_entry[0][2]
3687
if target_parent_id == entry[0][2]:
3688
# This is the root, so the parent is None
3689
target_parent_id = None
3691
self.last_target_parent[0] = new_dirname
3692
self.last_target_parent[1] = target_parent_id
3694
source_exec = source_details[3]
3695
changed = (content_change
3696
or source_parent_id != target_parent_id
3697
or old_basename != entry[0][1]
3698
or source_exec != target_exec
3700
if not changed and not self.include_unchanged:
3703
if old_path is None:
3704
old_path = path = pathjoin(old_dirname, old_basename)
3705
old_path_u = self.utf8_decode(old_path)[0]
3708
old_path_u = self.utf8_decode(old_path)[0]
3709
if old_path == path:
3712
path_u = self.utf8_decode(path)[0]
3713
source_kind = DirState._minikind_to_kind[source_minikind]
3714
return (entry[0][2],
3715
(old_path_u, path_u),
3718
(source_parent_id, target_parent_id),
3719
(self.utf8_decode(old_basename)[0], self.utf8_decode(entry[0][1])[0]),
3720
(source_kind, target_kind),
3721
(source_exec, target_exec)), changed
3722
elif source_minikind in 'a' and target_minikind in 'fdlt':
3723
# looks like a new file
3724
path = pathjoin(entry[0][0], entry[0][1])
3725
# parent id is the entry for the path in the target tree
3726
# TODO: these are the same for an entire directory: cache em.
3727
parent_id = self.state._get_entry(self.target_index,
3728
path_utf8=entry[0][0])[0][2]
3729
if parent_id == entry[0][2]:
3731
if path_info is not None:
3733
if self.use_filesystem_for_exec:
3734
# We need S_ISREG here, because we aren't sure if this
3737
stat.S_ISREG(path_info[3].st_mode)
3738
and stat.S_IEXEC & path_info[3].st_mode)
3740
target_exec = target_details[3]
3741
return (entry[0][2],
3742
(None, self.utf8_decode(path)[0]),
3746
(None, self.utf8_decode(entry[0][1])[0]),
3747
(None, path_info[2]),
3748
(None, target_exec)), True
3750
# Its a missing file, report it as such.
3751
return (entry[0][2],
3752
(None, self.utf8_decode(path)[0]),
3756
(None, self.utf8_decode(entry[0][1])[0]),
3758
(None, False)), True
3759
elif source_minikind in 'fdlt' and target_minikind in 'a':
3760
# unversioned, possibly, or possibly not deleted: we dont care.
3761
# if its still on disk, *and* theres no other entry at this
3762
# path [we dont know this in this routine at the moment -
3763
# perhaps we should change this - then it would be an unknown.
3764
old_path = pathjoin(entry[0][0], entry[0][1])
3765
# parent id is the entry for the path in the target tree
3766
parent_id = self.state._get_entry(self.source_index, path_utf8=entry[0][0])[0][2]
3767
if parent_id == entry[0][2]:
3769
return (entry[0][2],
3770
(self.utf8_decode(old_path)[0], None),
3774
(self.utf8_decode(entry[0][1])[0], None),
3775
(DirState._minikind_to_kind[source_minikind], None),
3776
(source_details[3], None)), True
3777
elif source_minikind in 'fdlt' and target_minikind in 'r':
3778
# a rename; could be a true rename, or a rename inherited from
3779
# a renamed parent. TODO: handle this efficiently. Its not
3780
# common case to rename dirs though, so a correct but slow
3781
# implementation will do.
3782
if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
3783
self.search_specific_files.add(target_details[1])
3784
elif source_minikind in 'ra' and target_minikind in 'ra':
3785
# neither of the selected trees contain this file,
3786
# so skip over it. This is not currently directly tested, but
3787
# is indirectly via test_too_much.TestCommands.test_conflicts.
3790
raise AssertionError("don't know how to compare "
3791
"source_minikind=%r, target_minikind=%r"
3792
% (source_minikind, target_minikind))
3798
def _gather_result_for_consistency(self, result):
3799
"""Check a result we will yield to make sure we are consistent later.
3801
This gathers result's parents into a set to output later.
3803
:param result: A result tuple.
3805
if not self.partial or not result[0]:
3807
self.seen_ids.add(result[0])
3808
new_path = result[1][1]
3810
# Not the root and not a delete: queue up the parents of the path.
3811
self.search_specific_file_parents.update(
3812
osutils.parent_directories(new_path.encode('utf8')))
3813
# Add the root directory which parent_directories does not
3815
self.search_specific_file_parents.add('')
3817
def iter_changes(self):
3818
"""Iterate over the changes."""
3819
utf8_decode = cache_utf8._utf8_decode
3820
_cmp_by_dirs = cmp_by_dirs
3821
_process_entry = self._process_entry
3822
search_specific_files = self.search_specific_files
3823
searched_specific_files = self.searched_specific_files
3824
splitpath = osutils.splitpath
3826
# compare source_index and target_index at or under each element of search_specific_files.
3827
# follow the following comparison table. Note that we only want to do diff operations when
3828
# the target is fdl because thats when the walkdirs logic will have exposed the pathinfo
3832
# Source | Target | disk | action
3833
# r | fdlt | | add source to search, add id path move and perform
3834
# | | | diff check on source-target
3835
# r | fdlt | a | dangling file that was present in the basis.
3837
# r | a | | add source to search
3839
# r | r | | this path is present in a non-examined tree, skip.
3840
# r | r | a | this path is present in a non-examined tree, skip.
3841
# a | fdlt | | add new id
3842
# a | fdlt | a | dangling locally added file, skip
3843
# a | a | | not present in either tree, skip
3844
# a | a | a | not present in any tree, skip
3845
# a | r | | not present in either tree at this path, skip as it
3846
# | | | may not be selected by the users list of paths.
3847
# a | r | a | not present in either tree at this path, skip as it
3848
# | | | may not be selected by the users list of paths.
3849
# fdlt | fdlt | | content in both: diff them
3850
# fdlt | fdlt | a | deleted locally, but not unversioned - show as deleted ?
3851
# fdlt | a | | unversioned: output deleted id for now
3852
# fdlt | a | a | unversioned and deleted: output deleted id
3853
# fdlt | r | | relocated in this tree, so add target to search.
3854
# | | | Dont diff, we will see an r,fd; pair when we reach
3855
# | | | this id at the other path.
3856
# fdlt | r | a | relocated in this tree, so add target to search.
3857
# | | | Dont diff, we will see an r,fd; pair when we reach
3858
# | | | this id at the other path.
3860
# TODO: jam 20070516 - Avoid the _get_entry lookup overhead by
3861
# keeping a cache of directories that we have seen.
3863
while search_specific_files:
3864
# TODO: the pending list should be lexically sorted? the
3865
# interface doesn't require it.
3866
current_root = search_specific_files.pop()
3867
current_root_unicode = current_root.decode('utf8')
3868
searched_specific_files.add(current_root)
3869
# process the entries for this containing directory: the rest will be
3870
# found by their parents recursively.
3871
root_entries = self.state._entries_for_path(current_root)
3872
root_abspath = self.tree.abspath(current_root_unicode)
3874
root_stat = os.lstat(root_abspath)
3876
if e.errno == errno.ENOENT:
3877
# the path does not exist: let _process_entry know that.
3878
root_dir_info = None
3880
# some other random error: hand it up.
3883
root_dir_info = ('', current_root,
3884
osutils.file_kind_from_stat_mode(root_stat.st_mode), root_stat,
3886
if root_dir_info[2] == 'directory':
3887
if self.tree._directory_is_tree_reference(
3888
current_root.decode('utf8')):
3889
root_dir_info = root_dir_info[:2] + \
3890
('tree-reference',) + root_dir_info[3:]
3892
if not root_entries and not root_dir_info:
3893
# this specified path is not present at all, skip it.
3895
path_handled = False
3896
for entry in root_entries:
3897
result, changed = _process_entry(entry, root_dir_info)
3898
if changed is not None:
3901
self._gather_result_for_consistency(result)
3902
if changed or self.include_unchanged:
3904
if self.want_unversioned and not path_handled and root_dir_info:
3905
new_executable = bool(
3906
stat.S_ISREG(root_dir_info[3].st_mode)
3907
and stat.S_IEXEC & root_dir_info[3].st_mode)
3909
(None, current_root_unicode),
3913
(None, splitpath(current_root_unicode)[-1]),
3914
(None, root_dir_info[2]),
3915
(None, new_executable)
3917
initial_key = (current_root, '', '')
3918
block_index, _ = self.state._find_block_index_from_key(initial_key)
3919
if block_index == 0:
3920
# we have processed the total root already, but because the
3921
# initial key matched it we should skip it here.
3923
if root_dir_info and root_dir_info[2] == 'tree-reference':
3924
current_dir_info = None
3926
dir_iterator = osutils._walkdirs_utf8(root_abspath, prefix=current_root)
3928
current_dir_info = dir_iterator.next()
3930
# on win32, python2.4 has e.errno == ERROR_DIRECTORY, but
3931
# python 2.5 has e.errno == EINVAL,
3932
# and e.winerror == ERROR_DIRECTORY
3933
e_winerror = getattr(e, 'winerror', None)
3934
win_errors = (ERROR_DIRECTORY, ERROR_PATH_NOT_FOUND)
3935
# there may be directories in the inventory even though
3936
# this path is not a file on disk: so mark it as end of
3938
if e.errno in (errno.ENOENT, errno.ENOTDIR, errno.EINVAL):
3939
current_dir_info = None
3940
elif (sys.platform == 'win32'
3941
and (e.errno in win_errors
3942
or e_winerror in win_errors)):
3943
current_dir_info = None
3947
if current_dir_info[0][0] == '':
3948
# remove .bzr from iteration
3949
bzr_index = bisect.bisect_left(current_dir_info[1], ('.bzr',))
3950
if current_dir_info[1][bzr_index][0] != '.bzr':
3951
raise AssertionError()
3952
del current_dir_info[1][bzr_index]
3953
# walk until both the directory listing and the versioned metadata
3955
if (block_index < len(self.state._dirblocks) and
3956
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3957
current_block = self.state._dirblocks[block_index]
3959
current_block = None
3960
while (current_dir_info is not None or
3961
current_block is not None):
3962
if (current_dir_info and current_block
3963
and current_dir_info[0][0] != current_block[0]):
3964
if _cmp_by_dirs(current_dir_info[0][0], current_block[0]) < 0:
3965
# filesystem data refers to paths not covered by the dirblock.
3966
# this has two possibilities:
3967
# A) it is versioned but empty, so there is no block for it
3968
# B) it is not versioned.
3970
# if (A) then we need to recurse into it to check for
3971
# new unknown files or directories.
3972
# if (B) then we should ignore it, because we don't
3973
# recurse into unknown directories.
3975
while path_index < len(current_dir_info[1]):
3976
current_path_info = current_dir_info[1][path_index]
3977
if self.want_unversioned:
3978
if current_path_info[2] == 'directory':
3979
if self.tree._directory_is_tree_reference(
3980
current_path_info[0].decode('utf8')):
3981
current_path_info = current_path_info[:2] + \
3982
('tree-reference',) + current_path_info[3:]
3983
new_executable = bool(
3984
stat.S_ISREG(current_path_info[3].st_mode)
3985
and stat.S_IEXEC & current_path_info[3].st_mode)
3987
(None, utf8_decode(current_path_info[0])[0]),
3991
(None, utf8_decode(current_path_info[1])[0]),
3992
(None, current_path_info[2]),
3993
(None, new_executable))
3994
# dont descend into this unversioned path if it is
3996
if current_path_info[2] in ('directory',
3998
del current_dir_info[1][path_index]
4002
# This dir info has been handled, go to the next
4004
current_dir_info = dir_iterator.next()
4005
except StopIteration:
4006
current_dir_info = None
4008
# We have a dirblock entry for this location, but there
4009
# is no filesystem path for this. This is most likely
4010
# because a directory was removed from the disk.
4011
# We don't have to report the missing directory,
4012
# because that should have already been handled, but we
4013
# need to handle all of the files that are contained
4015
for current_entry in current_block[1]:
4016
# entry referring to file not present on disk.
4017
# advance the entry only, after processing.
4018
result, changed = _process_entry(current_entry, None)
4019
if changed is not None:
4021
self._gather_result_for_consistency(result)
4022
if changed or self.include_unchanged:
4025
if (block_index < len(self.state._dirblocks) and
4026
osutils.is_inside(current_root,
4027
self.state._dirblocks[block_index][0])):
4028
current_block = self.state._dirblocks[block_index]
4030
current_block = None
4033
if current_block and entry_index < len(current_block[1]):
4034
current_entry = current_block[1][entry_index]
4036
current_entry = None
4037
advance_entry = True
4039
if current_dir_info and path_index < len(current_dir_info[1]):
4040
current_path_info = current_dir_info[1][path_index]
4041
if current_path_info[2] == 'directory':
4042
if self.tree._directory_is_tree_reference(
4043
current_path_info[0].decode('utf8')):
4044
current_path_info = current_path_info[:2] + \
4045
('tree-reference',) + current_path_info[3:]
4047
current_path_info = None
4049
path_handled = False
4050
while (current_entry is not None or
4051
current_path_info is not None):
4052
if current_entry is None:
4053
# the check for path_handled when the path is advanced
4054
# will yield this path if needed.
4056
elif current_path_info is None:
4057
# no path is fine: the per entry code will handle it.
4058
result, changed = _process_entry(current_entry, current_path_info)
4059
if changed is not None:
4061
self._gather_result_for_consistency(result)
4062
if changed or self.include_unchanged:
4064
elif (current_entry[0][1] != current_path_info[1]
4065
or current_entry[1][self.target_index][0] in 'ar'):
4066
# The current path on disk doesn't match the dirblock
4067
# record. Either the dirblock is marked as absent, or
4068
# the file on disk is not present at all in the
4069
# dirblock. Either way, report about the dirblock
4070
# entry, and let other code handle the filesystem one.
4072
# Compare the basename for these files to determine
4074
if current_path_info[1] < current_entry[0][1]:
4075
# extra file on disk: pass for now, but only
4076
# increment the path, not the entry
4077
advance_entry = False
4079
# entry referring to file not present on disk.
4080
# advance the entry only, after processing.
4081
result, changed = _process_entry(current_entry, None)
4082
if changed is not None:
4084
self._gather_result_for_consistency(result)
4085
if changed or self.include_unchanged:
4087
advance_path = False
4089
result, changed = _process_entry(current_entry, current_path_info)
4090
if changed is not None:
4093
self._gather_result_for_consistency(result)
4094
if changed or self.include_unchanged:
4096
if advance_entry and current_entry is not None:
4098
if entry_index < len(current_block[1]):
4099
current_entry = current_block[1][entry_index]
4101
current_entry = None
4103
advance_entry = True # reset the advance flaga
4104
if advance_path and current_path_info is not None:
4105
if not path_handled:
4106
# unversioned in all regards
4107
if self.want_unversioned:
4108
new_executable = bool(
4109
stat.S_ISREG(current_path_info[3].st_mode)
4110
and stat.S_IEXEC & current_path_info[3].st_mode)
4112
relpath_unicode = utf8_decode(current_path_info[0])[0]
4113
except UnicodeDecodeError:
4114
raise errors.BadFilenameEncoding(
4115
current_path_info[0], osutils._fs_enc)
4117
(None, relpath_unicode),
4121
(None, utf8_decode(current_path_info[1])[0]),
4122
(None, current_path_info[2]),
4123
(None, new_executable))
4124
# dont descend into this unversioned path if it is
4126
if current_path_info[2] in ('directory'):
4127
del current_dir_info[1][path_index]
4129
# dont descend the disk iterator into any tree
4131
if current_path_info[2] == 'tree-reference':
4132
del current_dir_info[1][path_index]
4135
if path_index < len(current_dir_info[1]):
4136
current_path_info = current_dir_info[1][path_index]
4137
if current_path_info[2] == 'directory':
4138
if self.tree._directory_is_tree_reference(
4139
current_path_info[0].decode('utf8')):
4140
current_path_info = current_path_info[:2] + \
4141
('tree-reference',) + current_path_info[3:]
4143
current_path_info = None
4144
path_handled = False
4146
advance_path = True # reset the advance flagg.
4147
if current_block is not None:
4149
if (block_index < len(self.state._dirblocks) and
4150
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
4151
current_block = self.state._dirblocks[block_index]
4153
current_block = None
4154
if current_dir_info is not None:
4156
current_dir_info = dir_iterator.next()
4157
except StopIteration:
4158
current_dir_info = None
4159
for result in self._iter_specific_file_parents():
4162
def _iter_specific_file_parents(self):
4163
"""Iter over the specific file parents."""
4164
while self.search_specific_file_parents:
4165
# Process the parent directories for the paths we were iterating.
4166
# Even in extremely large trees this should be modest, so currently
4167
# no attempt is made to optimise.
4168
path_utf8 = self.search_specific_file_parents.pop()
4169
if osutils.is_inside_any(self.searched_specific_files, path_utf8):
4170
# We've examined this path.
4172
if path_utf8 in self.searched_exact_paths:
4173
# We've examined this path.
4175
path_entries = self.state._entries_for_path(path_utf8)
4176
# We need either one or two entries. If the path in
4177
# self.target_index has moved (so the entry in source_index is in
4178
# 'ar') then we need to also look for the entry for this path in
4179
# self.source_index, to output the appropriate delete-or-rename.
4180
selected_entries = []
4182
for candidate_entry in path_entries:
4183
# Find entries present in target at this path:
4184
if candidate_entry[1][self.target_index][0] not in 'ar':
4186
selected_entries.append(candidate_entry)
4187
# Find entries present in source at this path:
4188
elif (self.source_index is not None and
4189
candidate_entry[1][self.source_index][0] not in 'ar'):
4191
if candidate_entry[1][self.target_index][0] == 'a':
4192
# Deleted, emit it here.
4193
selected_entries.append(candidate_entry)
4195
# renamed, emit it when we process the directory it
4197
self.search_specific_file_parents.add(
4198
candidate_entry[1][self.target_index][1])
4200
raise AssertionError(
4201
"Missing entry for specific path parent %r, %r" % (
4202
path_utf8, path_entries))
4203
path_info = self._path_info(path_utf8, path_utf8.decode('utf8'))
4204
for entry in selected_entries:
4205
if entry[0][2] in self.seen_ids:
4207
result, changed = self._process_entry(entry, path_info)
4209
raise AssertionError(
4210
"Got entry<->path mismatch for specific path "
4211
"%r entry %r path_info %r " % (
4212
path_utf8, entry, path_info))
4213
# Only include changes - we're outside the users requested
4216
self._gather_result_for_consistency(result)
4217
if (result[6][0] == 'directory' and
4218
result[6][1] != 'directory'):
4219
# This stopped being a directory, the old children have
4221
if entry[1][self.source_index][0] == 'r':
4222
# renamed, take the source path
4223
entry_path_utf8 = entry[1][self.source_index][1]
4225
entry_path_utf8 = path_utf8
4226
initial_key = (entry_path_utf8, '', '')
4227
block_index, _ = self.state._find_block_index_from_key(
4229
if block_index == 0:
4230
# The children of the root are in block index 1.
4232
current_block = None
4233
if block_index < len(self.state._dirblocks):
4234
current_block = self.state._dirblocks[block_index]
4235
if not osutils.is_inside(
4236
entry_path_utf8, current_block[0]):
4237
# No entries for this directory at all.
4238
current_block = None
4239
if current_block is not None:
4240
for entry in current_block[1]:
4241
if entry[1][self.source_index][0] in 'ar':
4242
# Not in the source tree, so doesn't have to be
4245
# Path of the entry itself.
4247
self.search_specific_file_parents.add(
4248
osutils.pathjoin(*entry[0][:2]))
4249
if changed or self.include_unchanged:
4251
self.searched_exact_paths.add(path_utf8)
4253
def _path_info(self, utf8_path, unicode_path):
4254
"""Generate path_info for unicode_path.
4256
:return: None if unicode_path does not exist, or a path_info tuple.
4258
abspath = self.tree.abspath(unicode_path)
4260
stat = os.lstat(abspath)
4262
if e.errno == errno.ENOENT:
4263
# the path does not exist.
4267
utf8_basename = utf8_path.rsplit('/', 1)[-1]
4268
dir_info = (utf8_path, utf8_basename,
4269
osutils.file_kind_from_stat_mode(stat.st_mode), stat,
4271
if dir_info[2] == 'directory':
4272
if self.tree._directory_is_tree_reference(
4274
self.root_dir_info = self.root_dir_info[:2] + \
4275
('tree-reference',) + self.root_dir_info[3:]
4279
# Try to load the compiled form if possible
4281
from bzrlib._dirstate_helpers_pyx import (
4287
ProcessEntryC as _process_entry,
4288
update_entry as update_entry,
4290
except ImportError, e:
4291
osutils.failed_to_load_extension(e)
4292
from bzrlib._dirstate_helpers_py import (
4299
# FIXME: It would be nice to be able to track moved lines so that the
4300
# corresponding python code can be moved to the _dirstate_helpers_py
4301
# module. I don't want to break the history for this important piece of
4302
# code so I left the code here -- vila 20090622
4303
update_entry = py_update_entry
4304
_process_entry = ProcessEntryPython