1
# Copyright (C) 2006, 2007 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 2", NL;
33
full checksum = "adler32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", digit, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
68
There may be multiple rows at the root, one per id present in the root, so the
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but its a file. The fingerprint is a
87
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
89
't' is a reference to a nested subtree; the fingerprint is the referenced
94
The entries on disk and in memory are ordered according to the following keys:
96
directory, as a list of components
100
--- Format 1 had the following different definition: ---
101
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
102
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
104
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
105
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
108
PARENT ROW's are emitted for every parent that is not in the ghosts details
109
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
110
each row will have a PARENT ROW for foo and baz, but not for bar.
113
In any tree, a kind of 'moved' indicates that the fingerprint field
114
(which we treat as opaque data specific to the 'kind' anyway) has the
115
details for the id of this row in that tree.
117
I'm strongly tempted to add a id->path index as well, but I think that
118
where we need id->path mapping; we also usually read the whole file, so
119
I'm going to skip that for the moment, as we have the ability to locate
120
via bisect any path in any tree, and if we lookup things by path, we can
121
accumulate a id->path mapping as we go, which will tend to match what we
124
I plan to implement this asap, so please speak up now to alter/tweak the
125
design - and once we stabilise on this, I'll update the wiki page for
128
The rationale for all this is that we want fast operations for the
129
common case (diff/status/commit/merge on all files) and extremely fast
130
operations for the less common but still occurs a lot status/diff/commit
131
on specific files). Operations on specific files involve a scan for all
132
the children of a path, *in every involved tree*, which the current
133
format did not accommodate.
137
1) Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
138
2) fall back current object model as needed.
139
3) scale usably to the largest trees known today - say 50K entries. (mozilla
140
is an example of this)
144
Eventually reuse dirstate objects across locks IFF the dirstate file has not
145
been modified, but will require that we flush/ignore cached stat-hit data
146
because we wont want to restat all files on disk just because a lock was
147
acquired, yet we cannot trust the data after the previous lock was released.
149
Memory representation:
150
vector of all directories, and vector of the childen ?
152
root_entrie = (direntry for root, [parent_direntries_for_root]),
154
('', ['data for achild', 'data for bchild', 'data for cchild'])
155
('dir', ['achild', 'cchild', 'echild'])
157
- single bisect to find N subtrees from a path spec
158
- in-order for serialisation - this is 'dirblock' grouping.
159
- insertion of a file '/a' affects only the '/' child-vector, that is, to
160
insert 10K elements from scratch does not generates O(N^2) memoves of a
161
single vector, rather each individual, which tends to be limited to a
162
manageable number. Will scale badly on trees with 10K entries in a
163
single directory. compare with Inventory.InventoryDirectory which has
164
a dictionary for the children. No bisect capability, can only probe for
165
exact matches, or grab all elements and sorta.
166
- Whats the risk of error here? Once we have the base format being processed
167
we should have a net win regardless of optimality. So we are going to
168
go with what seems reasonably.
171
maybe we should do a test profile of these core structure - 10K simulated searches/lookups/etc?
173
Objects for each row?
174
The lifetime of Dirstate objects is current per lock, but see above for
175
possible extensions. The lifetime of a row from a dirstate is expected to be
176
very short in the optimistic case: which we are optimising for. For instance,
177
subtree status will determine from analysis of the disk data what rows need to
178
be examined at all, and will be able to determine from a single row whether
179
that file has altered or not, so we are aiming to process tens of thousands of
180
entries each second within the dirstate context, before exposing anything to
181
the larger codebase. This suggests we want the time for a single file
182
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
183
processed, and to scale to 100 thousand we'll another order of magnitude to do
184
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
185
the file on disk, and then immediately discard, the overhead of object creation
186
becomes a significant cost.
188
Figures: Creating a tuple from from 3 elements was profiled at 0.0625
189
microseconds, whereas creating a object which is subclassed from tuple was
190
0.500 microseconds, and creating an object with 3 elements and slots was 3
191
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
192
down to 10 microseconds for the total processing - having 33% of that be object
193
creation is a huge overhead. There is a potential cost in using tuples within
194
each row which is that the conditional code to do comparisons may be slower
195
than method invocation, but method invocation is known to be slow due to stack
196
frame creation, so avoiding methods in these tight inner loops in unfortunately
197
desirable. We can consider a pyrex version of this with objects in future if
207
from stat import S_IEXEC
222
class _Bisector(object):
223
"""This just keeps track of information as we are bisecting."""
226
class DirState(object):
227
"""Record directory and metadata state for fast access.
229
A dirstate is a specialised data structure for managing local working
230
tree state information. Its not yet well defined whether it is platform
231
specific, and if it is how we detect/parameterise that.
233
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
234
Unlike most bzr disk formats, DirStates must be locked for reading, using
235
lock_read. (This is an os file lock internally.) This is necessary
236
because the file can be rewritten in place.
238
DirStates must be explicitly written with save() to commit changes; just
239
unlocking them does not write the changes to disk.
242
_kind_to_minikind = {
248
'tree-reference': 't',
250
_minikind_to_kind = {
256
't': 'tree-reference',
258
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
259
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
262
# TODO: jam 20070221 Make sure we handle when there are duplicated records
263
# (like when we remove + add the same path, or we have a rename)
264
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
265
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
266
# that we are sure a single record will always fit.
267
BISECT_PAGE_SIZE = 4096
270
IN_MEMORY_UNMODIFIED = 1
271
IN_MEMORY_MODIFIED = 2
273
# A pack_stat (the x's) that is just noise and will never match the output
276
NULL_PARENT_DETAILS = ('a', '', 0, False, '')
278
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
279
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
281
def __init__(self, path):
282
"""Create a DirState object.
286
:attr _root_entrie: The root row of the directory/file information,
287
- contains the path to / - '', ''
288
- kind of 'directory',
289
- the file id of the root in utf8
292
- and no sha information.
293
:param path: The path at which the dirstate file on disk should live.
295
# _header_state and _dirblock_state represent the current state
296
# of the dirstate metadata and the per-row data respectiely.
297
# NOT_IN_MEMORY indicates that no data is in memory
298
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
299
# is the same as is on disk
300
# IN_MEMORY_MODIFIED indicates that we have a modified version
301
# of what is on disk.
302
# In future we will add more granularity, for instance _dirblock_state
303
# will probably support partially-in-memory as a separate variable,
304
# allowing for partially-in-memory unmodified and partially-in-memory
306
self._header_state = DirState.NOT_IN_MEMORY
307
self._dirblock_state = DirState.NOT_IN_MEMORY
311
self._state_file = None
312
self._filename = path
313
self._lock_token = None
314
self._lock_state = None
315
self._id_index = None
316
self._end_of_header = None
317
self._cutoff_time = None
318
self._split_path_cache = {}
319
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
321
def add(self, path, file_id, kind, stat, fingerprint):
322
"""Add a path to be tracked.
324
:param path: The path within the dirstate - '' is the root, 'foo' is the
325
path foo within the root, 'foo/bar' is the path bar within foo
327
:param file_id: The file id of the path being added.
328
:param kind: The kind of the path, as a string like 'file',
330
:param stat: The output of os.lstat for the path.
331
:param fingerprint: The sha value of the file,
332
or the target of a symlink,
333
or the referenced revision id for tree-references,
334
or '' for directories.
337
# find the block its in.
338
# find the location in the block.
339
# check its not there
341
#------- copied from inventory.make_entry
342
# --- normalized_filename wants a unicode basename only, so get one.
343
dirname, basename = osutils.split(path)
344
# we dont import normalized_filename directly because we want to be
345
# able to change the implementation at runtime for tests.
346
norm_name, can_access = osutils.normalized_filename(basename)
347
if norm_name != basename:
351
raise errors.InvalidNormalization(path)
352
# now that we've normalised, we need the correct utf8 path and
353
# dirname and basename elements. This single encode and split should be
354
# faster than three separate encodes.
355
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
356
dirname, basename = osutils.split(utf8path)
357
assert file_id.__class__ == str, \
358
"must be a utf8 file_id not %s" % (type(file_id))
359
# Make sure the file_id does not exist in this tree
360
file_id_entry = self._get_entry(0, fileid_utf8=file_id)
361
if file_id_entry != (None, None):
362
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
363
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
364
info = '%s:%s' % (kind, path)
365
raise errors.DuplicateFileId(file_id, info)
366
first_key = (dirname, basename, '')
367
block_index, present = self._find_block_index_from_key(first_key)
369
# check the path is not in the tree
370
block = self._dirblocks[block_index][1]
371
entry_index, _ = self._find_entry_index(first_key, block)
372
while (entry_index < len(block) and
373
block[entry_index][0][0:2] == first_key[0:2]):
374
if block[entry_index][1][0][0] not in 'ar':
375
# this path is in the dirstate in the current tree.
376
raise Exception, "adding already added path!"
379
# The block where we want to put the file is not present. But it
380
# might be because the directory was empty, or not loaded yet. Look
381
# for a parent entry, if not found, raise NotVersionedError
382
parent_dir, parent_base = osutils.split(dirname)
383
parent_block_idx, parent_entry_idx, _, parent_present = \
384
self._get_block_entry_index(parent_dir, parent_base, 0)
385
if not parent_present:
386
raise errors.NotVersionedError(path, str(self))
387
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
388
block = self._dirblocks[block_index][1]
389
entry_key = (dirname, basename, file_id)
392
packed_stat = DirState.NULLSTAT
395
packed_stat = pack_stat(stat)
396
parent_info = self._empty_parent_info()
397
minikind = DirState._kind_to_minikind[kind]
399
entry_data = entry_key, [
400
(minikind, fingerprint, size, False, packed_stat),
402
elif kind == 'directory':
403
entry_data = entry_key, [
404
(minikind, '', 0, False, packed_stat),
406
elif kind == 'symlink':
407
entry_data = entry_key, [
408
(minikind, fingerprint, size, False, packed_stat),
410
elif kind == 'tree-reference':
411
entry_data = entry_key, [
412
(minikind, fingerprint, 0, False, packed_stat),
415
raise errors.BzrError('unknown kind %r' % kind)
416
entry_index, present = self._find_entry_index(entry_key, block)
417
assert not present, "basename %r already added" % basename
418
block.insert(entry_index, entry_data)
420
if kind == 'directory':
421
# insert a new dirblock
422
self._ensure_block(block_index, entry_index, utf8path)
423
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
425
self._id_index.setdefault(entry_key[2], set()).add(entry_key)
427
def _bisect(self, dir_name_list):
428
"""Bisect through the disk structure for specific rows.
430
:param dir_name_list: A list of (dir, name) pairs.
431
:return: A dict mapping (dir, name) => entry for found entries. Missing
432
entries will not be in the map.
434
self._requires_lock()
435
# We need the file pointer to be right after the initial header block
436
self._read_header_if_needed()
437
# If _dirblock_state was in memory, we should just return info from
438
# there, this function is only meant to handle when we want to read
440
assert self._dirblock_state == DirState.NOT_IN_MEMORY
442
# The disk representation is generally info + '\0\n\0' at the end. But
443
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
444
# Because it means we can sync on the '\n'
445
state_file = self._state_file
446
file_size = os.fstat(state_file.fileno()).st_size
447
# We end up with 2 extra fields, we should have a trailing '\n' to
448
# ensure that we read the whole record, and we should have a precursur
449
# '' which ensures that we start after the previous '\n'
450
entry_field_count = self._fields_per_entry() + 1
452
low = self._end_of_header
453
high = file_size - 1 # Ignore the final '\0'
454
# Map from (dir, name) => entry
457
# Avoid infinite seeking
458
max_count = 30*len(dir_name_list)
460
# pending is a list of places to look.
461
# each entry is a tuple of low, high, dir_names
462
# low -> the first byte offset to read (inclusive)
463
# high -> the last byte offset (inclusive)
464
# dir_names -> The list of (dir, name) pairs that should be found in
465
# the [low, high] range
466
pending = [(low, high, dir_name_list)]
468
page_size = self._bisect_page_size
470
fields_to_entry = self._get_fields_to_entry()
473
low, high, cur_files = pending.pop()
475
if not cur_files or low >= high:
480
if count > max_count:
481
raise errors.BzrError('Too many seeks, most likely a bug.')
483
mid = max(low, (low+high-page_size)/2)
486
# limit the read size, so we don't end up reading data that we have
488
read_size = min(page_size, (high-mid)+1)
489
block = state_file.read(read_size)
492
entries = block.split('\n')
495
# We didn't find a '\n', so we cannot have found any records.
496
# So put this range back and try again. But we know we have to
497
# increase the page size, because a single read did not contain
498
# a record break (so records must be larger than page_size)
500
pending.append((low, high, cur_files))
503
# Check the first and last entries, in case they are partial, or if
504
# we don't care about the rest of this page
506
first_fields = entries[0].split('\0')
507
if len(first_fields) < entry_field_count:
508
# We didn't get the complete first entry
509
# so move start, and grab the next, which
510
# should be a full entry
511
start += len(entries[0])+1
512
first_fields = entries[1].split('\0')
515
if len(first_fields) <= 2:
516
# We didn't even get a filename here... what do we do?
517
# Try a large page size and repeat this query
519
pending.append((low, high, cur_files))
522
# Find what entries we are looking for, which occur before and
523
# after this first record.
525
first_dir_name = (first_fields[1], first_fields[2])
526
first_loc = bisect.bisect_left(cur_files, first_dir_name)
528
# These exist before the current location
529
pre = cur_files[:first_loc]
530
# These occur after the current location, which may be in the
531
# data we read, or might be after the last entry
532
post = cur_files[first_loc:]
534
if post and len(first_fields) >= entry_field_count:
535
# We have files after the first entry
537
# Parse the last entry
538
last_entry_num = len(entries)-1
539
last_fields = entries[last_entry_num].split('\0')
540
if len(last_fields) < entry_field_count:
541
# The very last hunk was not complete,
542
# read the previous hunk
543
after = mid + len(block) - len(entries[-1])
545
last_fields = entries[last_entry_num].split('\0')
547
after = mid + len(block)
549
last_dir_name = (last_fields[1], last_fields[2])
550
last_loc = bisect.bisect_right(post, last_dir_name)
552
middle_files = post[:last_loc]
553
post = post[last_loc:]
556
# We have files that should occur in this block
557
# (>= first, <= last)
558
# Either we will find them here, or we can mark them as
561
if middle_files[0] == first_dir_name:
562
# We might need to go before this location
563
pre.append(first_dir_name)
564
if middle_files[-1] == last_dir_name:
565
post.insert(0, last_dir_name)
567
# Find out what paths we have
568
paths = {first_dir_name:[first_fields]}
569
# last_dir_name might == first_dir_name so we need to be
570
# careful if we should append rather than overwrite
571
if last_entry_num != first_entry_num:
572
paths.setdefault(last_dir_name, []).append(last_fields)
573
for num in xrange(first_entry_num+1, last_entry_num):
574
# TODO: jam 20070223 We are already splitting here, so
575
# shouldn't we just split the whole thing rather
576
# than doing the split again in add_one_record?
577
fields = entries[num].split('\0')
578
dir_name = (fields[1], fields[2])
579
paths.setdefault(dir_name, []).append(fields)
581
for dir_name in middle_files:
582
for fields in paths.get(dir_name, []):
583
# offset by 1 because of the opening '\0'
584
# consider changing fields_to_entry to avoid the
586
entry = fields_to_entry(fields[1:])
587
found.setdefault(dir_name, []).append(entry)
589
# Now we have split up everything into pre, middle, and post, and
590
# we have handled everything that fell in 'middle'.
591
# We add 'post' first, so that we prefer to seek towards the
592
# beginning, so that we will tend to go as early as we need, and
593
# then only seek forward after that.
595
pending.append((after, high, post))
597
pending.append((low, start-1, pre))
599
# Consider that we may want to return the directory entries in sorted
600
# order. For now, we just return them in whatever order we found them,
601
# and leave it up to the caller if they care if it is ordered or not.
604
def _bisect_dirblocks(self, dir_list):
605
"""Bisect through the disk structure to find entries in given dirs.
607
_bisect_dirblocks is meant to find the contents of directories, which
608
differs from _bisect, which only finds individual entries.
610
:param dir_list: An sorted list of directory names ['', 'dir', 'foo'].
611
:return: A map from dir => entries_for_dir
613
# TODO: jam 20070223 A lot of the bisecting logic could be shared
614
# between this and _bisect. It would require parameterizing the
615
# inner loop with a function, though. We should evaluate the
616
# performance difference.
617
self._requires_lock()
618
# We need the file pointer to be right after the initial header block
619
self._read_header_if_needed()
620
# If _dirblock_state was in memory, we should just return info from
621
# there, this function is only meant to handle when we want to read
623
assert self._dirblock_state == DirState.NOT_IN_MEMORY
625
# The disk representation is generally info + '\0\n\0' at the end. But
626
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
627
# Because it means we can sync on the '\n'
628
state_file = self._state_file
629
file_size = os.fstat(state_file.fileno()).st_size
630
# We end up with 2 extra fields, we should have a trailing '\n' to
631
# ensure that we read the whole record, and we should have a precursur
632
# '' which ensures that we start after the previous '\n'
633
entry_field_count = self._fields_per_entry() + 1
635
low = self._end_of_header
636
high = file_size - 1 # Ignore the final '\0'
637
# Map from dir => entry
640
# Avoid infinite seeking
641
max_count = 30*len(dir_list)
643
# pending is a list of places to look.
644
# each entry is a tuple of low, high, dir_names
645
# low -> the first byte offset to read (inclusive)
646
# high -> the last byte offset (inclusive)
647
# dirs -> The list of directories that should be found in
648
# the [low, high] range
649
pending = [(low, high, dir_list)]
651
page_size = self._bisect_page_size
653
fields_to_entry = self._get_fields_to_entry()
656
low, high, cur_dirs = pending.pop()
658
if not cur_dirs or low >= high:
663
if count > max_count:
664
raise errors.BzrError('Too many seeks, most likely a bug.')
666
mid = max(low, (low+high-page_size)/2)
669
# limit the read size, so we don't end up reading data that we have
671
read_size = min(page_size, (high-mid)+1)
672
block = state_file.read(read_size)
675
entries = block.split('\n')
678
# We didn't find a '\n', so we cannot have found any records.
679
# So put this range back and try again. But we know we have to
680
# increase the page size, because a single read did not contain
681
# a record break (so records must be larger than page_size)
683
pending.append((low, high, cur_dirs))
686
# Check the first and last entries, in case they are partial, or if
687
# we don't care about the rest of this page
689
first_fields = entries[0].split('\0')
690
if len(first_fields) < entry_field_count:
691
# We didn't get the complete first entry
692
# so move start, and grab the next, which
693
# should be a full entry
694
start += len(entries[0])+1
695
first_fields = entries[1].split('\0')
698
if len(first_fields) <= 1:
699
# We didn't even get a dirname here... what do we do?
700
# Try a large page size and repeat this query
702
pending.append((low, high, cur_dirs))
705
# Find what entries we are looking for, which occur before and
706
# after this first record.
708
first_dir = first_fields[1]
709
first_loc = bisect.bisect_left(cur_dirs, first_dir)
711
# These exist before the current location
712
pre = cur_dirs[:first_loc]
713
# These occur after the current location, which may be in the
714
# data we read, or might be after the last entry
715
post = cur_dirs[first_loc:]
717
if post and len(first_fields) >= entry_field_count:
718
# We have records to look at after the first entry
720
# Parse the last entry
721
last_entry_num = len(entries)-1
722
last_fields = entries[last_entry_num].split('\0')
723
if len(last_fields) < entry_field_count:
724
# The very last hunk was not complete,
725
# read the previous hunk
726
after = mid + len(block) - len(entries[-1])
728
last_fields = entries[last_entry_num].split('\0')
730
after = mid + len(block)
732
last_dir = last_fields[1]
733
last_loc = bisect.bisect_right(post, last_dir)
735
middle_files = post[:last_loc]
736
post = post[last_loc:]
739
# We have files that should occur in this block
740
# (>= first, <= last)
741
# Either we will find them here, or we can mark them as
744
if middle_files[0] == first_dir:
745
# We might need to go before this location
746
pre.append(first_dir)
747
if middle_files[-1] == last_dir:
748
post.insert(0, last_dir)
750
# Find out what paths we have
751
paths = {first_dir:[first_fields]}
752
# last_dir might == first_dir so we need to be
753
# careful if we should append rather than overwrite
754
if last_entry_num != first_entry_num:
755
paths.setdefault(last_dir, []).append(last_fields)
756
for num in xrange(first_entry_num+1, last_entry_num):
757
# TODO: jam 20070223 We are already splitting here, so
758
# shouldn't we just split the whole thing rather
759
# than doing the split again in add_one_record?
760
fields = entries[num].split('\0')
761
paths.setdefault(fields[1], []).append(fields)
763
for cur_dir in middle_files:
764
for fields in paths.get(cur_dir, []):
765
# offset by 1 because of the opening '\0'
766
# consider changing fields_to_entry to avoid the
768
entry = fields_to_entry(fields[1:])
769
found.setdefault(cur_dir, []).append(entry)
771
# Now we have split up everything into pre, middle, and post, and
772
# we have handled everything that fell in 'middle'.
773
# We add 'post' first, so that we prefer to seek towards the
774
# beginning, so that we will tend to go as early as we need, and
775
# then only seek forward after that.
777
pending.append((after, high, post))
779
pending.append((low, start-1, pre))
783
def _bisect_recursive(self, dir_name_list):
784
"""Bisect for entries for all paths and their children.
786
This will use bisect to find all records for the supplied paths. It
787
will then continue to bisect for any records which are marked as
788
directories. (and renames?)
790
:param paths: A sorted list of (dir, name) pairs
791
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
792
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
794
# Map from (dir, name, file_id) => [tree_info]
797
found_dir_names = set()
799
# Directories that have been read
800
processed_dirs = set()
801
# Get the ball rolling with the first bisect for all entries.
802
newly_found = self._bisect(dir_name_list)
805
# Directories that need to be read
807
paths_to_search = set()
808
for entry_list in newly_found.itervalues():
809
for dir_name_id, trees_info in entry_list:
810
found[dir_name_id] = trees_info
811
found_dir_names.add(dir_name_id[:2])
813
for tree_info in trees_info:
814
minikind = tree_info[0]
817
# We already processed this one as a directory,
818
# we don't need to do the extra work again.
820
subdir, name, file_id = dir_name_id
821
path = osutils.pathjoin(subdir, name)
823
if path not in processed_dirs:
824
pending_dirs.add(path)
825
elif minikind == 'r':
826
# Rename, we need to directly search the target
827
# which is contained in the fingerprint column
828
dir_name = osutils.split(tree_info[1])
829
if dir_name[0] in pending_dirs:
830
# This entry will be found in the dir search
832
# TODO: We need to check if this entry has
833
# already been found. Otherwise we might be
834
# hitting infinite recursion.
835
if dir_name not in found_dir_names:
836
paths_to_search.add(dir_name)
837
# Now we have a list of paths to look for directly, and
838
# directory blocks that need to be read.
839
# newly_found is mixing the keys between (dir, name) and path
840
# entries, but that is okay, because we only really care about the
842
newly_found = self._bisect(sorted(paths_to_search))
843
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
844
processed_dirs.update(pending_dirs)
847
def _empty_parent_info(self):
848
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
851
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
852
"""Ensure a block for dirname exists.
854
This function exists to let callers which know that there is a
855
directory dirname ensure that the block for it exists. This block can
856
fail to exist because of demand loading, or because a directory had no
857
children. In either case it is not an error. It is however an error to
858
call this if there is no parent entry for the directory, and thus the
859
function requires the coordinates of such an entry to be provided.
861
The root row is special cased and can be indicated with a parent block
864
:param parent_block_index: The index of the block in which dirname's row
866
:param parent_row_index: The index in the parent block where the row
868
:param dirname: The utf8 dirname to ensure there is a block for.
869
:return: The index for the block.
871
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
872
# This is the signature of the root row, and the
873
# contents-of-root row is always index 1
875
# the basename of the directory must be the end of its full name.
876
if not (parent_block_index == -1 and
877
parent_block_index == -1 and dirname == ''):
878
assert dirname.endswith(
879
self._dirblocks[parent_block_index][1][parent_row_index][0][1])
880
block_index, present = self._find_block_index_from_key((dirname, '', ''))
882
## In future, when doing partial parsing, this should load and
883
# populate the entire block.
884
self._dirblocks.insert(block_index, (dirname, []))
887
def _entries_to_current_state(self, new_entries):
888
"""Load new_entries into self.dirblocks.
890
Process new_entries into the current state object, making them the active
891
state. The entries are grouped together by directory to form dirblocks.
893
:param new_entries: A sorted list of entries. This function does not sort
894
to prevent unneeded overhead when callers have a sorted list already.
897
assert new_entries[0][0][0:2] == ('', ''), \
898
"Missing root row %r" % (new_entries[0][0],)
899
# The two blocks here are deliberate: the root block and the
900
# contents-of-root block.
901
self._dirblocks = [('', []), ('', [])]
902
current_block = self._dirblocks[0][1]
905
append_entry = current_block.append
906
for entry in new_entries:
907
if entry[0][0] != current_dirname:
908
# new block - different dirname
910
current_dirname = entry[0][0]
911
self._dirblocks.append((current_dirname, current_block))
912
append_entry = current_block.append
913
# append the entry to the current block
915
self._split_root_dirblock_into_contents()
917
def _split_root_dirblock_into_contents(self):
918
"""Split the root dirblocks into root and contents-of-root.
920
After parsing by path, we end up with root entries and contents-of-root
921
entries in the same block. This loop splits them out again.
923
# The above loop leaves the "root block" entries mixed with the
924
# "contents-of-root block". But we don't want an if check on
925
# all entries, so instead we just fix it up here.
926
assert self._dirblocks[1] == ('', [])
928
contents_of_root_block = []
929
for entry in self._dirblocks[0][1]:
930
if not entry[0][1]: # This is a root entry
931
root_block.append(entry)
933
contents_of_root_block.append(entry)
934
self._dirblocks[0] = ('', root_block)
935
self._dirblocks[1] = ('', contents_of_root_block)
937
def _entry_to_line(self, entry):
938
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
940
:param entry: An entry_tuple as defined in the module docstring.
942
entire_entry = list(entry[0])
943
for tree_number, tree_data in enumerate(entry[1]):
944
# (minikind, fingerprint, size, executable, tree_specific_string)
945
entire_entry.extend(tree_data)
946
# 3 for the key, 5 for the fields per tree.
947
tree_offset = 3 + tree_number * 5
949
entire_entry[tree_offset + 0] = tree_data[0]
951
entire_entry[tree_offset + 2] = str(tree_data[2])
953
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
954
return '\0'.join(entire_entry)
956
def _fields_per_entry(self):
957
"""How many null separated fields should be in each entry row.
959
Each line now has an extra '\n' field which is not used
960
so we just skip over it
963
+ number of fields per tree_data (5) * tree count
966
tree_count = 1 + self._num_present_parents()
967
return 3 + 5 * tree_count + 1
969
def _find_block(self, key, add_if_missing=False):
970
"""Return the block that key should be present in.
972
:param key: A dirstate entry key.
973
:return: The block tuple.
975
block_index, present = self._find_block_index_from_key(key)
977
if not add_if_missing:
978
# check to see if key is versioned itself - we might want to
979
# add it anyway, because dirs with no entries dont get a
980
# dirblock at parse time.
981
# This is an uncommon branch to take: most dirs have children,
982
# and most code works with versioned paths.
983
parent_base, parent_name = osutils.split(key[0])
984
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
985
# some parent path has not been added - its an error to add
987
raise errors.NotVersionedError(key[0:2], str(self))
988
self._dirblocks.insert(block_index, (key[0], []))
989
return self._dirblocks[block_index]
991
def _find_block_index_from_key(self, key):
992
"""Find the dirblock index for a key.
994
:return: The block index, True if the block for the key is present.
996
if key[0:2] == ('', ''):
998
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
999
cache=self._split_path_cache)
1000
# _right returns one-past-where-key is so we have to subtract
1001
# one to use it. we use _right here because there are two
1002
# '' blocks - the root, and the contents of root
1003
# we always have a minimum of 2 in self._dirblocks: root and
1004
# root-contents, and for '', we get 2 back, so this is
1005
# simple and correct:
1006
present = (block_index < len(self._dirblocks) and
1007
self._dirblocks[block_index][0] == key[0])
1008
return block_index, present
1010
def _find_entry_index(self, key, block):
1011
"""Find the entry index for a key in a block.
1013
:return: The entry index, True if the entry for the key is present.
1015
entry_index = bisect.bisect_left(block, (key, []))
1016
present = (entry_index < len(block) and
1017
block[entry_index][0] == key)
1018
return entry_index, present
1021
def from_tree(tree, dir_state_filename):
1022
"""Create a dirstate from a bzr Tree.
1024
:param tree: The tree which should provide parent information and
1026
:return: a DirState object which is currently locked for writing.
1027
(it was locked by DirState.initialize)
1029
result = DirState.initialize(dir_state_filename)
1033
parent_ids = tree.get_parent_ids()
1034
num_parents = len(parent_ids)
1036
for parent_id in parent_ids:
1037
parent_tree = tree.branch.repository.revision_tree(parent_id)
1038
parent_trees.append((parent_id, parent_tree))
1039
parent_tree.lock_read()
1040
result.set_parent_trees(parent_trees, [])
1041
result.set_state_from_inventory(tree.inventory)
1043
for revid, parent_tree in parent_trees:
1044
parent_tree.unlock()
1047
# The caller won't have a chance to unlock this, so make sure we
1053
def update_entry(self, entry, abspath, stat_value=None):
1054
"""Update the entry based on what is actually on disk.
1056
:param entry: This is the dirblock entry for the file in question.
1057
:param abspath: The path on disk for this file.
1058
:param stat_value: (optional) if we already have done a stat on the
1060
:return: The sha1 hexdigest of the file (40 bytes) or link target of a
1063
# This code assumes that the entry passed in is directly held in one of
1064
# the internal _dirblocks. So the dirblock state must have already been
1066
assert self._dirblock_state != DirState.NOT_IN_MEMORY
1067
if stat_value is None:
1069
# We could inline os.lstat but the common case is that
1070
# stat_value will be passed in, not read here.
1071
stat_value = self._lstat(abspath, entry)
1072
except (OSError, IOError), e:
1073
if e.errno in (errno.ENOENT, errno.EACCES,
1075
# The entry is missing, consider it gone
1079
kind = osutils.file_kind_from_stat_mode(stat_value.st_mode)
1081
minikind = DirState._kind_to_minikind[kind]
1082
except KeyError: # Unknown kind
1084
packed_stat = pack_stat(stat_value)
1085
(saved_minikind, saved_link_or_sha1, saved_file_size,
1086
saved_executable, saved_packed_stat) = entry[1][0]
1088
if (minikind == saved_minikind
1089
and packed_stat == saved_packed_stat
1090
# size should also be in packed_stat
1091
and saved_file_size == stat_value.st_size):
1092
# The stat hasn't changed since we saved, so we can potentially
1093
# re-use the saved sha hash.
1097
if self._cutoff_time is None:
1098
self._sha_cutoff_time()
1100
if (stat_value.st_mtime < self._cutoff_time
1101
and stat_value.st_ctime < self._cutoff_time):
1102
# Return the existing fingerprint
1103
return saved_link_or_sha1
1105
# If we have gotten this far, that means that we need to actually
1106
# process this entry.
1109
link_or_sha1 = self._sha1_file(abspath, entry)
1110
executable = self._is_executable(stat_value.st_mode,
1112
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
1113
executable, packed_stat)
1114
elif minikind == 'd':
1116
entry[1][0] = ('d', '', 0, False, packed_stat)
1117
if saved_minikind != 'd':
1118
# This changed from something into a directory. Make sure we
1119
# have a directory block for it. This doesn't happen very
1120
# often, so this doesn't have to be super fast.
1121
block_index, entry_index, dir_present, file_present = \
1122
self._get_block_entry_index(entry[0][0], entry[0][1], 0)
1123
self._ensure_block(block_index, entry_index,
1124
osutils.pathjoin(entry[0][0], entry[0][1]))
1125
elif minikind == 'l':
1126
link_or_sha1 = self._read_link(abspath, saved_link_or_sha1)
1127
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
1129
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1132
def _sha_cutoff_time(self):
1133
"""Return cutoff time.
1135
Files modified more recently than this time are at risk of being
1136
undetectably modified and so can't be cached.
1138
# Cache the cutoff time as long as we hold a lock.
1139
# time.time() isn't super expensive (approx 3.38us), but
1140
# when you call it 50,000 times it adds up.
1141
# For comparison, os.lstat() costs 7.2us if it is hot.
1142
self._cutoff_time = int(time.time()) - 3
1143
return self._cutoff_time
1145
def _lstat(self, abspath, entry):
1146
"""Return the os.lstat value for this path."""
1147
return os.lstat(abspath)
1149
def _sha1_file(self, abspath, entry):
1150
"""Calculate the SHA1 of a file by reading the full text"""
1151
f = file(abspath, 'rb', buffering=65000)
1153
return osutils.sha_file(f)
1157
def _is_executable(self, mode, old_executable):
1158
"""Is this file executable?"""
1159
return bool(S_IEXEC & mode)
1161
def _is_executable_win32(self, mode, old_executable):
1162
"""On win32 the executable bit is stored in the dirstate."""
1163
return old_executable
1165
if sys.platform == 'win32':
1166
_is_executable = _is_executable_win32
1168
def _read_link(self, abspath, old_link):
1169
"""Read the target of a symlink"""
1170
# TODO: jam 200700301 On Win32, this could just return the value
1171
# already in memory. However, this really needs to be done at a
1172
# higher level, because there either won't be anything on disk,
1173
# or the thing on disk will be a file.
1174
return os.readlink(abspath)
1176
def get_ghosts(self):
1177
"""Return a list of the parent tree revision ids that are ghosts."""
1178
self._read_header_if_needed()
1181
def get_lines(self):
1182
"""Serialise the entire dirstate to a sequence of lines."""
1183
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1184
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1185
# read whats on disk.
1186
self._state_file.seek(0)
1187
return self._state_file.readlines()
1189
lines.append(self._get_parents_line(self.get_parent_ids()))
1190
lines.append(self._get_ghosts_line(self._ghosts))
1191
# append the root line which is special cased
1192
lines.extend(map(self._entry_to_line, self._iter_entries()))
1193
return self._get_output_lines(lines)
1195
def _get_ghosts_line(self, ghost_ids):
1196
"""Create a line for the state file for ghost information."""
1197
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1199
def _get_parents_line(self, parent_ids):
1200
"""Create a line for the state file for parents information."""
1201
return '\0'.join([str(len(parent_ids))] + parent_ids)
1203
def _get_fields_to_entry(self):
1204
"""Get a function which converts entry fields into a entry record.
1206
This handles size and executable, as well as parent records.
1208
:return: A function which takes a list of fields, and returns an
1209
appropriate record for storing in memory.
1211
# This is intentionally unrolled for performance
1212
num_present_parents = self._num_present_parents()
1213
if num_present_parents == 0:
1214
def fields_to_entry_0_parents(fields, _int=int):
1215
path_name_file_id_key = (fields[0], fields[1], fields[2])
1216
return (path_name_file_id_key, [
1218
fields[3], # minikind
1219
fields[4], # fingerprint
1220
_int(fields[5]), # size
1221
fields[6] == 'y', # executable
1222
fields[7], # packed_stat or revision_id
1224
return fields_to_entry_0_parents
1225
elif num_present_parents == 1:
1226
def fields_to_entry_1_parent(fields, _int=int):
1227
path_name_file_id_key = (fields[0], fields[1], fields[2])
1228
return (path_name_file_id_key, [
1230
fields[3], # minikind
1231
fields[4], # fingerprint
1232
_int(fields[5]), # size
1233
fields[6] == 'y', # executable
1234
fields[7], # packed_stat or revision_id
1237
fields[8], # minikind
1238
fields[9], # fingerprint
1239
_int(fields[10]), # size
1240
fields[11] == 'y', # executable
1241
fields[12], # packed_stat or revision_id
1244
return fields_to_entry_1_parent
1245
elif num_present_parents == 2:
1246
def fields_to_entry_2_parents(fields, _int=int):
1247
path_name_file_id_key = (fields[0], fields[1], fields[2])
1248
return (path_name_file_id_key, [
1250
fields[3], # minikind
1251
fields[4], # fingerprint
1252
_int(fields[5]), # size
1253
fields[6] == 'y', # executable
1254
fields[7], # packed_stat or revision_id
1257
fields[8], # minikind
1258
fields[9], # fingerprint
1259
_int(fields[10]), # size
1260
fields[11] == 'y', # executable
1261
fields[12], # packed_stat or revision_id
1264
fields[13], # minikind
1265
fields[14], # fingerprint
1266
_int(fields[15]), # size
1267
fields[16] == 'y', # executable
1268
fields[17], # packed_stat or revision_id
1271
return fields_to_entry_2_parents
1273
def fields_to_entry_n_parents(fields, _int=int):
1274
path_name_file_id_key = (fields[0], fields[1], fields[2])
1275
trees = [(fields[cur], # minikind
1276
fields[cur+1], # fingerprint
1277
_int(fields[cur+2]), # size
1278
fields[cur+3] == 'y', # executable
1279
fields[cur+4], # stat or revision_id
1280
) for cur in xrange(3, len(fields)-1, 5)]
1281
return path_name_file_id_key, trees
1282
return fields_to_entry_n_parents
1284
def get_parent_ids(self):
1285
"""Return a list of the parent tree ids for the directory state."""
1286
self._read_header_if_needed()
1287
return list(self._parents)
1289
def _get_block_entry_index(self, dirname, basename, tree_index):
1290
"""Get the coordinates for a path in the state structure.
1292
:param dirname: The utf8 dirname to lookup.
1293
:param basename: The utf8 basename to lookup.
1294
:param tree_index: The index of the tree for which this lookup should
1296
:return: A tuple describing where the path is located, or should be
1297
inserted. The tuple contains four fields: the block index, the row
1298
index, anda two booleans are True when the directory is present, and
1299
when the entire path is present. There is no guarantee that either
1300
coordinate is currently reachable unless the found field for it is
1301
True. For instance, a directory not present in the searched tree
1302
may be returned with a value one greater than the current highest
1303
block offset. The directory present field will always be True when
1304
the path present field is True. The directory present field does
1305
NOT indicate that the directory is present in the searched tree,
1306
rather it indicates that there are at least some files in some
1309
self._read_dirblocks_if_needed()
1310
key = dirname, basename, ''
1311
block_index, present = self._find_block_index_from_key(key)
1313
# no such directory - return the dir index and 0 for the row.
1314
return block_index, 0, False, False
1315
block = self._dirblocks[block_index][1] # access the entries only
1316
entry_index, present = self._find_entry_index(key, block)
1317
# linear search through present entries at this path to find the one
1319
while entry_index < len(block) and block[entry_index][0][1] == basename:
1320
if block[entry_index][1][tree_index][0] not in \
1321
('a', 'r'): # absent, relocated
1322
return block_index, entry_index, True, True
1324
return block_index, entry_index, True, False
1326
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None):
1327
"""Get the dirstate entry for path in tree tree_index
1329
If either file_id or path is supplied, it is used as the key to lookup.
1330
If both are supplied, the fastest lookup is used, and an error is
1331
raised if they do not both point at the same row.
1333
:param tree_index: The index of the tree we wish to locate this path
1334
in. If the path is present in that tree, the entry containing its
1335
details is returned, otherwise (None, None) is returned
1336
0 is the working tree, higher indexes are successive parent
1338
:param fileid_utf8: A utf8 file_id to look up.
1339
:param path_utf8: An utf8 path to be looked up.
1340
:return: The dirstate entry tuple for path, or (None, None)
1342
self._read_dirblocks_if_needed()
1343
if path_utf8 is not None:
1344
assert path_utf8.__class__ == str, 'path_utf8 is not a str: %s %s' % (type(path_utf8), path_utf8)
1345
# path lookups are faster
1346
dirname, basename = osutils.split(path_utf8)
1347
block_index, entry_index, dir_present, file_present = \
1348
self._get_block_entry_index(dirname, basename, tree_index)
1349
if not file_present:
1351
entry = self._dirblocks[block_index][1][entry_index]
1352
assert entry[0][2] and entry[1][tree_index][0] not in ('a', 'r'), 'unversioned entry?!?!'
1354
if entry[0][2] != fileid_utf8:
1355
raise errors.BzrError('integrity error ? : mismatching'
1356
' tree_index, file_id and path')
1359
assert fileid_utf8 is not None
1360
possible_keys = self._get_id_index().get(fileid_utf8, None)
1361
if not possible_keys:
1363
for key in possible_keys:
1364
block_index, present = \
1365
self._find_block_index_from_key(key)
1366
# strange, probably indicates an out of date
1367
# id index - for now, allow this.
1370
# WARNING: DO not change this code to use _get_block_entry_index
1371
# as that function is not suitable: it does not use the key
1372
# to lookup, and thus the wront coordinates are returned.
1373
block = self._dirblocks[block_index][1]
1374
entry_index, present = self._find_entry_index(key, block)
1376
entry = self._dirblocks[block_index][1][entry_index]
1377
if entry[1][tree_index][0] in 'fdlt':
1378
# this is the result we are looking for: the
1379
# real home of this file_id in this tree.
1381
if entry[1][tree_index][0] == 'a':
1382
# there is no home for this entry in this tree
1384
assert entry[1][tree_index][0] == 'r', \
1385
"entry %r has invalid minikind %r for tree %r" \
1387
entry[1][tree_index][0],
1389
real_path = entry[1][tree_index][1]
1390
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
1391
path_utf8=real_path)
1395
def initialize(cls, path):
1396
"""Create a new dirstate on path.
1398
The new dirstate will be an empty tree - that is it has no parents,
1399
and only a root node - which has id ROOT_ID.
1401
The object will be write locked when returned to the caller,
1402
unless there was an exception in the writing, in which case it
1405
:param path: The name of the file for the dirstate.
1406
:return: A DirState object.
1408
# This constructs a new DirState object on a path, sets the _state_file
1409
# to a new empty file for that path. It then calls _set_data() with our
1410
# stock empty dirstate information - a root with ROOT_ID, no children,
1411
# and no parents. Finally it calls save() to ensure that this data will
1414
# root dir and root dir contents with no children.
1415
empty_tree_dirblocks = [('', []), ('', [])]
1416
# a new root directory, with a NULLSTAT.
1417
empty_tree_dirblocks[0][1].append(
1418
(('', '', inventory.ROOT_ID), [
1419
('d', '', 0, False, DirState.NULLSTAT),
1423
result._set_data([], empty_tree_dirblocks)
1430
def _inv_entry_to_details(self, inv_entry):
1431
"""Convert an inventory entry (from a revision tree) to state details.
1433
:param inv_entry: An inventory entry whose sha1 and link targets can be
1434
relied upon, and which has a revision set.
1435
:return: A details tuple - the details for a single tree at a path +
1438
kind = inv_entry.kind
1439
minikind = DirState._kind_to_minikind[kind]
1440
tree_data = inv_entry.revision
1441
assert len(tree_data) > 0, 'empty revision for the inv_entry.'
1442
if kind == 'directory':
1446
elif kind == 'symlink':
1447
fingerprint = inv_entry.symlink_target or ''
1450
elif kind == 'file':
1451
fingerprint = inv_entry.text_sha1 or ''
1452
size = inv_entry.text_size or 0
1453
executable = inv_entry.executable
1454
elif kind == 'tree-reference':
1455
fingerprint = inv_entry.reference_revision or ''
1459
raise Exception("can't pack %s" % inv_entry)
1460
return (minikind, fingerprint, size, executable, tree_data)
1462
def _iter_entries(self):
1463
"""Iterate over all the entries in the dirstate.
1465
Each yelt item is an entry in the standard format described in the
1466
docstring of bzrlib.dirstate.
1468
self._read_dirblocks_if_needed()
1469
for directory in self._dirblocks:
1470
for entry in directory[1]:
1473
def _get_id_index(self):
1474
"""Get an id index of self._dirblocks."""
1475
if self._id_index is None:
1477
for key, tree_details in self._iter_entries():
1478
id_index.setdefault(key[2], set()).add(key)
1479
self._id_index = id_index
1480
return self._id_index
1482
def _get_output_lines(self, lines):
1483
"""format lines for final output.
1485
:param lines: A sequece of lines containing the parents list and the
1488
output_lines = [DirState.HEADER_FORMAT_3]
1489
lines.append('') # a final newline
1490
inventory_text = '\0\n\0'.join(lines)
1491
output_lines.append('adler32: %s\n' % (zlib.adler32(inventory_text),))
1492
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
1493
num_entries = len(lines)-3
1494
output_lines.append('num_entries: %s\n' % (num_entries,))
1495
output_lines.append(inventory_text)
1498
def _make_deleted_row(self, fileid_utf8, parents):
1499
"""Return a deleted for for fileid_utf8."""
1500
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
1503
def _num_present_parents(self):
1504
"""The number of parent entries in each record row."""
1505
return len(self._parents) - len(self._ghosts)
1509
"""Construct a DirState on the file at path path.
1511
:return: An unlocked DirState object, associated with the given path.
1513
result = DirState(path)
1516
def _read_dirblocks_if_needed(self):
1517
"""Read in all the dirblocks from the file if they are not in memory.
1519
This populates self._dirblocks, and sets self._dirblock_state to
1520
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
1523
self._read_header_if_needed()
1524
if self._dirblock_state == DirState.NOT_IN_MEMORY:
1525
# move the _state_file pointer to after the header (in case bisect
1526
# has been called in the mean time)
1527
self._state_file.seek(self._end_of_header)
1528
text = self._state_file.read()
1529
# TODO: check the adler checksums. adler_measured = zlib.adler32(text)
1531
fields = text.split('\0')
1532
# Remove the last blank entry
1533
trailing = fields.pop()
1534
assert trailing == ''
1535
# consider turning fields into a tuple.
1537
# skip the first field which is the trailing null from the header.
1539
# Each line now has an extra '\n' field which is not used
1540
# so we just skip over it
1542
# 3 fields for the key
1543
# + number of fields per tree_data (5) * tree count
1545
num_present_parents = self._num_present_parents()
1546
tree_count = 1 + num_present_parents
1547
entry_size = self._fields_per_entry()
1548
expected_field_count = entry_size * self._num_entries
1549
if len(fields) - cur > expected_field_count:
1550
fields = fields[:expected_field_count + cur]
1551
trace.mutter('Unexpectedly long dirstate field count!')
1552
print "XXX: incorrectly truncated dirstate file bug triggered."
1553
field_count = len(fields)
1554
# this checks our adjustment, and also catches file too short.
1555
assert field_count - cur == expected_field_count, \
1556
'field count incorrect %s != %s, entry_size=%s, '\
1557
'num_entries=%s fields=%r' % (
1558
field_count - cur, expected_field_count, entry_size,
1559
self._num_entries, fields)
1561
if num_present_parents == 1:
1562
# Bind external functions to local names
1564
# We access all fields in order, so we can just iterate over
1565
# them. Grab an straight iterator over the fields. (We use an
1566
# iterator because we don't want to do a lot of additions, nor
1567
# do we want to do a lot of slicing)
1568
next = iter(fields).next
1569
# Move the iterator to the current position
1570
for x in xrange(cur):
1572
# The two blocks here are deliberate: the root block and the
1573
# contents-of-root block.
1574
self._dirblocks = [('', []), ('', [])]
1575
current_block = self._dirblocks[0][1]
1576
current_dirname = ''
1577
append_entry = current_block.append
1578
for count in xrange(self._num_entries):
1582
if dirname != current_dirname:
1583
# new block - different dirname
1585
current_dirname = dirname
1586
self._dirblocks.append((current_dirname, current_block))
1587
append_entry = current_block.append
1588
# we know current_dirname == dirname, so re-use it to avoid
1589
# creating new strings
1590
entry = ((current_dirname, name, file_id),
1593
next(), # fingerprint
1594
_int(next()), # size
1595
next() == 'y', # executable
1596
next(), # packed_stat or revision_id
1600
next(), # fingerprint
1601
_int(next()), # size
1602
next() == 'y', # executable
1603
next(), # packed_stat or revision_id
1607
assert trailing == '\n'
1608
# append the entry to the current block
1610
self._split_root_dirblock_into_contents()
1612
fields_to_entry = self._get_fields_to_entry()
1613
entries = [fields_to_entry(fields[pos:pos+entry_size])
1614
for pos in xrange(cur, field_count, entry_size)]
1615
self._entries_to_current_state(entries)
1616
# To convert from format 2 => format 3
1617
# self._dirblocks = sorted(self._dirblocks,
1618
# key=lambda blk:blk[0].split('/'))
1619
# To convert from format 3 => format 2
1620
# self._dirblocks = sorted(self._dirblocks)
1621
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
1623
def _read_header(self):
1624
"""This reads in the metadata header, and the parent ids.
1626
After reading in, the file should be positioned at the null
1627
just before the start of the first record in the file.
1629
:return: (expected adler checksum, number of entries, parent list)
1631
self._read_prelude()
1632
parent_line = self._state_file.readline()
1633
info = parent_line.split('\0')
1634
num_parents = int(info[0])
1635
assert num_parents == len(info)-2, 'incorrect parent info line'
1636
self._parents = info[1:-1]
1638
ghost_line = self._state_file.readline()
1639
info = ghost_line.split('\0')
1640
num_ghosts = int(info[1])
1641
assert num_ghosts == len(info)-3, 'incorrect ghost info line'
1642
self._ghosts = info[2:-1]
1643
self._header_state = DirState.IN_MEMORY_UNMODIFIED
1644
self._end_of_header = self._state_file.tell()
1646
def _read_header_if_needed(self):
1647
"""Read the header of the dirstate file if needed."""
1648
# inline this as it will be called a lot
1649
if not self._lock_token:
1650
raise errors.ObjectNotLocked(self)
1651
if self._header_state == DirState.NOT_IN_MEMORY:
1654
def _read_prelude(self):
1655
"""Read in the prelude header of the dirstate file
1657
This only reads in the stuff that is not connected to the adler
1658
checksum. The position will be correct to read in the rest of
1659
the file and check the checksum after this point.
1660
The next entry in the file should be the number of parents,
1661
and their ids. Followed by a newline.
1663
header = self._state_file.readline()
1664
assert header == DirState.HEADER_FORMAT_3, \
1665
'invalid header line: %r' % (header,)
1666
adler_line = self._state_file.readline()
1667
assert adler_line.startswith('adler32: '), 'missing adler32 checksum'
1668
self.adler_expected = int(adler_line[len('adler32: '):-1])
1669
num_entries_line = self._state_file.readline()
1670
assert num_entries_line.startswith('num_entries: '), 'missing num_entries line'
1671
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
1674
"""Save any pending changes created during this session.
1676
We reuse the existing file, because that prevents race conditions with
1677
file creation, and use oslocks on it to prevent concurrent modification
1678
and reads - because dirstates incremental data aggretation is not
1679
compatible with reading a modified file, and replacing a file in use by
1680
another process is impossible on windows.
1682
A dirstate in read only mode should be smart enough though to validate
1683
that the file has not changed, and otherwise discard its cache and
1684
start over, to allow for fine grained read lock duration, so 'status'
1685
wont block 'commit' - for example.
1687
if (self._header_state == DirState.IN_MEMORY_MODIFIED or
1688
self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
1690
if self._lock_state == 'w':
1691
out_file = self._state_file
1694
# Try to grab a write lock so that we can update the file.
1696
wlock = lock.WriteLock(self._filename)
1697
except (errors.LockError, errors.LockContention), e:
1698
# We couldn't grab the lock, so just leave things dirty in
1702
# This may be a read-only tree, or someone else may have a
1703
# ReadLock. so handle the case when we cannot grab a write
1705
if e.errno in (errno.ENOENT, errno.EPERM, errno.EACCES,
1707
# Ignore these errors and just don't save anything
1713
out_file.writelines(self.get_lines())
1716
self._header_state = DirState.IN_MEMORY_UNMODIFIED
1717
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
1719
if wlock is not None:
1722
def _set_data(self, parent_ids, dirblocks):
1723
"""Set the full dirstate data in memory.
1725
This is an internal function used to completely replace the objects
1726
in memory state. It puts the dirstate into state 'full-dirty'.
1728
:param parent_ids: A list of parent tree revision ids.
1729
:param dirblocks: A list containing one tuple for each directory in the
1730
tree. Each tuple contains the directory path and a list of entries
1731
found in that directory.
1733
# our memory copy is now authoritative.
1734
self._dirblocks = dirblocks
1735
self._header_state = DirState.IN_MEMORY_MODIFIED
1736
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1737
self._parents = list(parent_ids)
1738
self._id_index = None
1740
def set_path_id(self, path, new_id):
1741
"""Change the id of path to new_id in the current working tree.
1743
:param path: The path inside the tree to set - '' is the root, 'foo'
1744
is the path foo in the root.
1745
:param new_id: The new id to assign to the path. This must be a utf8
1746
file id (not unicode, and not None).
1748
# TODO: start warning here.
1749
assert new_id.__class__ == str
1750
self._read_dirblocks_if_needed()
1752
import pdb;pdb.set_trace()
1754
raise NotImplementedError(self.set_path_id)
1755
# TODO: check new id is unique
1756
entry = self._get_entry(0, path_utf8=path)
1757
if entry[0][2] == new_id:
1758
# Nothing to change.
1760
# mark the old path absent, and insert a new root path
1761
self._make_absent(entry)
1762
self.update_minimal(('', '', new_id), 'd',
1763
path_utf8='', packed_stat=entry[1][0][4])
1764
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1765
if self._id_index is not None:
1766
self._id_index.setdefault(new_id, set()).add(entry[0])
1768
def set_parent_trees(self, trees, ghosts):
1769
"""Set the parent trees for the dirstate.
1771
:param trees: A list of revision_id, tree tuples. tree must be provided
1772
even if the revision_id refers to a ghost: supply an empty tree in
1774
:param ghosts: A list of the revision_ids that are ghosts at the time
1778
# TODO: generate a list of parent indexes to preserve to save
1779
# processing specific parent trees. In the common case one tree will
1780
# be preserved - the left most parent.
1781
# TODO: if the parent tree is a dirstate, we might want to walk them
1782
# all by path in parallel for 'optimal' common-case performance.
1783
# generate new root row.
1784
self._read_dirblocks_if_needed()
1785
# TODO future sketch: Examine the existing parents to generate a change
1786
# map and then walk the new parent trees only, mapping them into the
1787
# dirstate. Walk the dirstate at the same time to remove unreferenced
1790
# sketch: loop over all entries in the dirstate, cherry picking
1791
# entries from the parent trees, if they are not ghost trees.
1792
# after we finish walking the dirstate, all entries not in the dirstate
1793
# are deletes, so we want to append them to the end as per the design
1794
# discussions. So do a set difference on ids with the parents to
1795
# get deletes, and add them to the end.
1796
# During the update process we need to answer the following questions:
1797
# - find other keys containing a fileid in order to create cross-path
1798
# links. We dont't trivially use the inventory from other trees
1799
# because this leads to either double touching, or to accessing
1801
# - find other keys containing a path
1802
# We accumulate each entry via this dictionary, including the root
1805
# we could do parallel iterators, but because file id data may be
1806
# scattered throughout, we dont save on index overhead: we have to look
1807
# at everything anyway. We can probably save cycles by reusing parent
1808
# data and doing an incremental update when adding an additional
1809
# parent, but for now the common cases are adding a new parent (merge),
1810
# and replacing completely (commit), and commit is more common: so
1811
# optimise merge later.
1813
# ---- start generation of full tree mapping data
1814
# what trees should we use?
1815
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
1816
# how many trees do we end up with
1817
parent_count = len(parent_trees)
1819
# one: the current tree
1820
for entry in self._iter_entries():
1821
# skip entries not in the current tree
1822
if entry[1][0][0] in ('a', 'r'): # absent, relocated
1824
by_path[entry[0]] = [entry[1][0]] + \
1825
[DirState.NULL_PARENT_DETAILS] * parent_count
1826
id_index[entry[0][2]] = set([entry[0]])
1828
# now the parent trees:
1829
for tree_index, tree in enumerate(parent_trees):
1830
# the index is off by one, adjust it.
1831
tree_index = tree_index + 1
1832
# when we add new locations for a fileid we need these ranges for
1833
# any fileid in this tree as we set the by_path[id] to:
1834
# already_processed_tree_details + new_details + new_location_suffix
1835
# the suffix is from tree_index+1:parent_count+1.
1836
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
1837
# now stitch in all the entries from this tree
1838
for path, entry in tree.inventory.iter_entries_by_dir():
1839
# here we process each trees details for each item in the tree.
1840
# we first update any existing entries for the id at other paths,
1841
# then we either create or update the entry for the id at the
1842
# right path, and finally we add (if needed) a mapping from
1843
# file_id to this path. We do it in this order to allow us to
1844
# avoid checking all known paths for the id when generating a
1845
# new entry at this path: by adding the id->path mapping last,
1846
# all the mappings are valid and have correct relocation
1847
# records where needed.
1848
file_id = entry.file_id
1849
path_utf8 = path.encode('utf8')
1850
dirname, basename = osutils.split(path_utf8)
1851
new_entry_key = (dirname, basename, file_id)
1852
# tree index consistency: All other paths for this id in this tree
1853
# index must point to the correct path.
1854
for entry_key in id_index.setdefault(file_id, set()):
1855
# TODO:PROFILING: It might be faster to just update
1856
# rather than checking if we need to, and then overwrite
1857
# the one we are located at.
1858
if entry_key != new_entry_key:
1859
# this file id is at a different path in one of the
1860
# other trees, so put absent pointers there
1861
# This is the vertical axis in the matrix, all pointing
1863
by_path[entry_key][tree_index] = ('r', path_utf8, 0, False, '')
1864
# by path consistency: Insert into an existing path record (trivial), or
1865
# add a new one with relocation pointers for the other tree indexes.
1866
if new_entry_key in id_index[file_id]:
1867
# there is already an entry where this data belongs, just insert it.
1868
by_path[new_entry_key][tree_index] = \
1869
self._inv_entry_to_details(entry)
1871
# add relocated entries to the horizontal axis - this row
1872
# mapping from path,id. We need to look up the correct path
1873
# for the indexes from 0 to tree_index -1
1875
for lookup_index in xrange(tree_index):
1876
# boundary case: this is the first occurence of file_id
1877
# so there are no id_indexs, possibly take this out of
1879
if not len(id_index[file_id]):
1880
new_details.append(DirState.NULL_PARENT_DETAILS)
1882
# grab any one entry, use it to find the right path.
1883
# TODO: optimise this to reduce memory use in highly
1884
# fragmented situations by reusing the relocation
1886
a_key = iter(id_index[file_id]).next()
1887
if by_path[a_key][lookup_index][0] in ('r', 'a'):
1888
# its a pointer or missing statement, use it as is.
1889
new_details.append(by_path[a_key][lookup_index])
1891
# we have the right key, make a pointer to it.
1892
real_path = ('/'.join(a_key[0:2])).strip('/')
1893
new_details.append(('r', real_path, 0, False, ''))
1894
new_details.append(self._inv_entry_to_details(entry))
1895
new_details.extend(new_location_suffix)
1896
by_path[new_entry_key] = new_details
1897
id_index[file_id].add(new_entry_key)
1898
# --- end generation of full tree mappings
1900
# sort and output all the entries
1901
new_entries = self._sort_entries(by_path.items())
1902
self._entries_to_current_state(new_entries)
1903
self._parents = [rev_id for rev_id, tree in trees]
1904
self._ghosts = list(ghosts)
1905
self._header_state = DirState.IN_MEMORY_MODIFIED
1906
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1907
self._id_index = id_index
1910
def _sort_entries(self, entry_list):
1911
"""Given a list of entries, sort them into the right order.
1913
This is done when constructing a new dirstate from trees - normally we
1914
try to keep everything in sorted blocks all the time, but sometimes
1915
it's easier to sort after the fact.
1917
# TODO: Might be faster to do a scwartzian transform?
1919
# sort by: directory parts, file name, file id
1920
return entry[0][0].split('/'), entry[0][1], entry[0][2]
1921
return sorted(entry_list, key=_key)
1923
def set_state_from_inventory(self, new_inv):
1924
"""Set new_inv as the current state.
1926
This API is called by tree transform, and will usually occur with
1927
existing parent trees.
1929
:param new_inv: The inventory object to set current state from.
1931
self._read_dirblocks_if_needed()
1933
# incremental algorithm:
1934
# two iterators: current data and new data, both in dirblock order.
1935
new_iterator = new_inv.iter_entries_by_dir()
1936
# we will be modifying the dirstate, so we need a stable iterator. In
1937
# future we might write one, for now we just clone the state into a
1938
# list - which is a shallow copy, so each
1939
old_iterator = iter(list(self._iter_entries()))
1940
# both must have roots so this is safe:
1941
current_new = new_iterator.next()
1942
current_old = old_iterator.next()
1943
def advance(iterator):
1945
return iterator.next()
1946
except StopIteration:
1948
while current_new or current_old:
1949
# skip entries in old that are not really there
1950
if current_old and current_old[1][0][0] in ('r', 'a'):
1951
# relocated or absent
1952
current_old = advance(old_iterator)
1955
# convert new into dirblock style
1956
new_path_utf8 = current_new[0].encode('utf8')
1957
new_dirname, new_basename = osutils.split(new_path_utf8)
1958
new_id = current_new[1].file_id
1959
new_entry_key = (new_dirname, new_basename, new_id)
1960
current_new_minikind = \
1961
DirState._kind_to_minikind[current_new[1].kind]
1962
if current_new_minikind == 't':
1963
fingerprint = current_new[1].reference_revision
1967
# for safety disable variables
1968
new_path_utf8 = new_dirname = new_basename = new_id = new_entry_key = None
1969
# 5 cases, we dont have a value that is strictly greater than everything, so
1970
# we make both end conditions explicit
1972
# old is finished: insert current_new into the state.
1973
self.update_minimal(new_entry_key, current_new_minikind,
1974
executable=current_new[1].executable,
1975
path_utf8=new_path_utf8, fingerprint=fingerprint)
1976
current_new = advance(new_iterator)
1977
elif not current_new:
1979
self._make_absent(current_old)
1980
current_old = advance(old_iterator)
1981
elif new_entry_key == current_old[0]:
1982
# same - common case
1983
# TODO: update the record if anything significant has changed.
1984
# the minimal required trigger is if the execute bit or cached
1986
if (current_old[1][0][3] != current_new[1].executable or
1987
current_old[1][0][0] != current_new_minikind):
1988
self.update_minimal(current_old[0], current_new_minikind,
1989
executable=current_new[1].executable,
1990
path_utf8=new_path_utf8, fingerprint=fingerprint)
1991
# both sides are dealt with, move on
1992
current_old = advance(old_iterator)
1993
current_new = advance(new_iterator)
1994
elif new_entry_key < current_old[0]:
1996
# add a entry for this and advance new
1997
self.update_minimal(new_entry_key, current_new_minikind,
1998
executable=current_new[1].executable,
1999
path_utf8=new_path_utf8, fingerprint=fingerprint)
2000
current_new = advance(new_iterator)
2003
self._make_absent(current_old)
2004
current_old = advance(old_iterator)
2005
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2006
self._id_index = None
2008
def _make_absent(self, current_old):
2009
"""Mark current_old - an entry - as absent for tree 0.
2011
:return: True if this was the last details entry for they entry key:
2012
that is, if the underlying block has had the entry removed, thus
2013
shrinking in length.
2015
# build up paths that this id will be left at after the change is made,
2016
# so we can update their cross references in tree 0
2017
all_remaining_keys = set()
2018
# Dont check the working tree, because its going.
2019
for details in current_old[1][1:]:
2020
if details[0] not in ('a', 'r'): # absent, relocated
2021
all_remaining_keys.add(current_old[0])
2022
elif details[0] == 'r': # relocated
2023
# record the key for the real path.
2024
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2025
# absent rows are not present at any path.
2026
last_reference = current_old[0] not in all_remaining_keys
2028
# the current row consists entire of the current item (being marked
2029
# absent), and relocated or absent entries for the other trees:
2030
# Remove it, its meaningless.
2031
block = self._find_block(current_old[0])
2032
entry_index, present = self._find_entry_index(current_old[0], block[1])
2033
assert present, 'could not find entry for %s' % (current_old,)
2034
block[1].pop(entry_index)
2035
# if we have an id_index in use, remove this key from it for this id.
2036
if self._id_index is not None:
2037
self._id_index[current_old[0][2]].remove(current_old[0])
2038
# update all remaining keys for this id to record it as absent. The
2039
# existing details may either be the record we are making as deleted
2040
# (if there were other trees with the id present at this path), or may
2042
for update_key in all_remaining_keys:
2043
update_block_index, present = \
2044
self._find_block_index_from_key(update_key)
2045
assert present, 'could not find block for %s' % (update_key,)
2046
update_entry_index, present = \
2047
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2048
assert present, 'could not find entry for %s' % (update_key,)
2049
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2050
# it must not be absent at the moment
2051
assert update_tree_details[0][0] != 'a' # absent
2052
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2053
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2054
return last_reference
2056
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2057
packed_stat=None, size=0, path_utf8=None):
2058
"""Update an entry to the state in tree 0.
2060
This will either create a new entry at 'key' or update an existing one.
2061
It also makes sure that any other records which might mention this are
2064
:param key: (dir, name, file_id) for the new entry
2065
:param minikind: The type for the entry ('f' == 'file', 'd' ==
2067
:param executable: Should the executable bit be set?
2068
:param fingerprint: Simple fingerprint for new entry.
2069
:param packed_stat: packed stat value for new entry.
2070
:param size: Size information for new entry
2071
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
2074
block = self._find_block(key)[1]
2075
if packed_stat is None:
2076
packed_stat = DirState.NULLSTAT
2077
entry_index, present = self._find_entry_index(key, block)
2078
new_details = (minikind, fingerprint, size, executable, packed_stat)
2079
id_index = self._get_id_index()
2081
# new entry, synthesis cross reference here,
2082
existing_keys = id_index.setdefault(key[2], set())
2083
if not existing_keys:
2084
# not currently in the state, simplest case
2085
new_entry = key, [new_details] + self._empty_parent_info()
2087
# present at one or more existing other paths.
2088
# grab one of them and use it to generate parent
2089
# relocation/absent entries.
2090
new_entry = key, [new_details]
2091
for other_key in existing_keys:
2092
# change the record at other to be a pointer to this new
2093
# record. The loop looks similar to the change to
2094
# relocations when updating an existing record but its not:
2095
# the test for existing kinds is different: this can be
2096
# factored out to a helper though.
2097
other_block_index, present = self._find_block_index_from_key(other_key)
2099
import pdb; pdb.set_trace()
2100
assert present, 'could not find block for %s' % (other_key,)
2101
other_entry_index, present = self._find_entry_index(other_key,
2102
self._dirblocks[other_block_index][1])
2104
import pdb; pdb.set_trace()
2105
assert present, 'could not find entry for %s' % (other_key,)
2106
assert path_utf8 is not None
2107
self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
2108
('r', path_utf8, 0, False, '')
2110
num_present_parents = self._num_present_parents()
2111
for lookup_index in xrange(1, num_present_parents + 1):
2112
# grab any one entry, use it to find the right path.
2113
# TODO: optimise this to reduce memory use in highly
2114
# fragmented situations by reusing the relocation
2116
update_block_index, present = \
2117
self._find_block_index_from_key(other_key)
2118
assert present, 'could not find block for %s' % (other_key,)
2119
update_entry_index, present = \
2120
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
2121
assert present, 'could not find entry for %s' % (other_key,)
2122
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
2123
if update_details[0] in ('r', 'a'): # relocated, absent
2124
# its a pointer or absent in lookup_index's tree, use
2126
new_entry[1].append(update_details)
2128
# we have the right key, make a pointer to it.
2129
pointer_path = osutils.pathjoin(*other_key[0:2])
2130
new_entry[1].append(('r', pointer_path, 0, False, ''))
2131
block.insert(entry_index, new_entry)
2132
existing_keys.add(key)
2134
# Does the new state matter?
2135
block[entry_index][1][0] = new_details
2136
# parents cannot be affected by what we do.
2137
# other occurences of this id can be found
2138
# from the id index.
2140
# tree index consistency: All other paths for this id in this tree
2141
# index must point to the correct path. We have to loop here because
2142
# we may have passed entries in the state with this file id already
2143
# that were absent - where parent entries are - and they need to be
2144
# converted to relocated.
2145
assert path_utf8 is not None
2146
for entry_key in id_index.setdefault(key[2], set()):
2147
# TODO:PROFILING: It might be faster to just update
2148
# rather than checking if we need to, and then overwrite
2149
# the one we are located at.
2150
if entry_key != key:
2151
# this file id is at a different path in one of the
2152
# other trees, so put absent pointers there
2153
# This is the vertical axis in the matrix, all pointing
2155
block_index, present = self._find_block_index_from_key(entry_key)
2157
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
2159
self._dirblocks[block_index][1][entry_index][1][0] = \
2160
('r', path_utf8, 0, False, '')
2161
# add a containing dirblock if needed.
2162
if new_details[0] == 'd':
2163
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
2164
block_index, present = self._find_block_index_from_key(subdir_key)
2166
self._dirblocks.insert(block_index, (subdir_key[0], []))
2168
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2170
def _validate(self):
2171
"""Check that invariants on the dirblock are correct.
2173
This can be useful in debugging; it shouldn't be necessary in
2176
from pprint import pformat
2177
if len(self._dirblocks) > 0:
2178
assert self._dirblocks[0][0] == '', \
2179
"dirblocks don't start with root block:\n" + \
2181
if len(self._dirblocks) > 1:
2182
assert self._dirblocks[1][0] == '', \
2183
"dirblocks missing root directory:\n" + \
2185
# the dirblocks are sorted by their path components, name, and dir id
2186
dir_names = [d[0].split('/')
2187
for d in self._dirblocks[1:]]
2188
if dir_names != sorted(dir_names):
2189
raise AssertionError(
2190
"dir names are not in sorted order:\n" + \
2191
pformat(self._dirblocks) + \
2194
for dirblock in self._dirblocks:
2195
# within each dirblock, the entries are sorted by filename and
2197
assert dirblock[1] == sorted(dirblock[1]), \
2198
"dirblock for %r is not sorted:\n%s" % \
2199
(dirblock[0], pformat(dirblock))
2201
def _wipe_state(self):
2202
"""Forget all state information about the dirstate."""
2203
self._header_state = DirState.NOT_IN_MEMORY
2204
self._dirblock_state = DirState.NOT_IN_MEMORY
2207
self._dirblocks = []
2208
self._id_index = None
2209
self._end_of_header = None
2210
self._cutoff_time = None
2211
self._split_path_cache = {}
2213
def lock_read(self):
2214
"""Acquire a read lock on the dirstate"""
2215
if self._lock_token is not None:
2216
raise errors.LockContention(self._lock_token)
2217
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2218
# already in memory, we could read just the header and check for
2219
# any modification. If not modified, we can just leave things
2221
self._lock_token = lock.ReadLock(self._filename)
2222
self._lock_state = 'r'
2223
self._state_file = self._lock_token.f
2226
def lock_write(self):
2227
"""Acquire a write lock on the dirstate"""
2228
if self._lock_token is not None:
2229
raise errors.LockContention(self._lock_token)
2230
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2231
# already in memory, we could read just the header and check for
2232
# any modification. If not modified, we can just leave things
2234
self._lock_token = lock.WriteLock(self._filename)
2235
self._lock_state = 'w'
2236
self._state_file = self._lock_token.f
2240
"""Drop any locks held on the dirstate"""
2241
if self._lock_token is None:
2242
raise errors.LockNotHeld(self)
2243
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2244
# already in memory, we could read just the header and check for
2245
# any modification. If not modified, we can just leave things
2247
self._state_file = None
2248
self._lock_state = None
2249
self._lock_token.unlock()
2250
self._lock_token = None
2251
self._split_path_cache = {}
2253
def _requires_lock(self):
2254
"""Checks that a lock is currently held by someone on the dirstate"""
2255
if not self._lock_token:
2256
raise errors.ObjectNotLocked(self)
2259
def bisect_dirblock(dirblocks, dirname, lo=0, hi=None, cache={}):
2260
"""Return the index where to insert dirname into the dirblocks.
2262
The return value idx is such that all directories blocks in dirblock[:idx]
2263
have names < dirname, and all blocks in dirblock[idx:] have names >=
2266
Optional args lo (default 0) and hi (default len(dirblocks)) bound the
2267
slice of a to be searched.
2272
dirname_split = cache[dirname]
2274
dirname_split = dirname.split('/')
2275
cache[dirname] = dirname_split
2278
# Grab the dirname for the current dirblock
2279
cur = dirblocks[mid][0]
2281
cur_split = cache[cur]
2283
cur_split = cur.split('/')
2284
cache[cur] = cur_split
2285
if cur_split < dirname_split: lo = mid+1
2291
def pack_stat(st, _encode=base64.encodestring, _pack=struct.pack):
2292
"""Convert stat values into a packed representation."""
2293
# jam 20060614 it isn't really worth removing more entries if we
2294
# are going to leave it in packed form.
2295
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
2296
# With all entries filesize is 5.9M and read time is mabye 280ms
2297
# well within the noise margin
2299
# base64.encode always adds a final newline, so strip it off
2300
return _encode(_pack('>llllll'
2301
, st.st_size, int(st.st_mtime), int(st.st_ctime)
2302
, st.st_dev, st.st_ino, st.st_mode))[:-1]