20
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
21
are not - this is done for clarity of reading. All string data is in utf8.
25
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
28
WHOLE_NUMBER = {digit}, digit;
30
REVISION_ID = a non-empty utf8 string;
32
dirstate format = header line, full checksum, row count, parent details,
33
ghost_details, entries;
34
header line = "#bazaar dirstate flat format 3", NL;
35
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
36
row count = "num_entries: ", WHOLE_NUMBER, NL;
37
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
38
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
40
entry = entry_key, current_entry_details, {parent_entry_details};
41
entry_key = dirname, basename, fileid;
42
current_entry_details = common_entry_details, working_entry_details;
43
parent_entry_details = common_entry_details, history_entry_details;
44
common_entry_details = MINIKIND, fingerprint, size, executable
45
working_entry_details = packed_stat
46
history_entry_details = REVISION_ID;
49
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
51
Given this definition, the following is useful to know::
53
entry (aka row) - all the data for a given key.
54
entry[0]: The key (dirname, basename, fileid)
58
entry[1]: The tree(s) data for this path and id combination.
59
entry[1][0]: The current tree
60
entry[1][1]: The second tree
62
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate::
64
entry[1][0][0]: minikind
65
entry[1][0][1]: fingerprint
67
entry[1][0][3]: executable
68
entry[1][0][4]: packed_stat
72
entry[1][1][4]: revision_id
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 2", NL;
33
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", digit, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
74
68
There may be multiple rows at the root, one per id present in the root, so the
75
in memory root row is now::
77
self._dirblocks[0] -> ('', [entry ...]),
79
and the entries in there are::
83
entries[0][2]: file_id
84
entries[1][0]: The tree data for the current tree for this fileid at /
89
'r' is a relocated entry: This path is not present in this tree with this
90
id, but the id can be found at another location. The fingerprint is
91
used to point to the target location.
92
'a' is an absent entry: In that tree the id is not present at this path.
93
'd' is a directory entry: This path in this tree is a directory with the
94
current file id. There is no fingerprint for directories.
95
'f' is a file entry: As for directory, but it's a file. The fingerprint is
96
the sha1 value of the file's canonical form, i.e. after any read
97
filters have been applied to the convenience form stored in the working
99
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is
101
't' is a reference to a nested subtree; the fingerprint is the referenced
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but its a file. The fingerprint is a
87
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
89
't' is a reference to a nested subtree; the fingerprint is the referenced
106
The entries on disk and in memory are ordered according to the following keys::
94
The entries on disk and in memory are ordered according to the following keys:
108
96
directory, as a list of components
112
100
--- Format 1 had the following different definition: ---
116
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
117
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
119
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
120
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
101
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
102
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
104
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
105
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
123
108
PARENT ROW's are emitted for every parent that is not in the ghosts details
124
109
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
1351
def _check_delta_is_valid(self, delta):
1352
return list(inventory._check_delta_unique_ids(
1353
inventory._check_delta_unique_old_paths(
1354
inventory._check_delta_unique_new_paths(
1355
inventory._check_delta_ids_match_entry(
1356
inventory._check_delta_ids_are_valid(
1357
inventory._check_delta_new_path_entry_both_or_None(delta)))))))
1359
def update_by_delta(self, delta):
1360
"""Apply an inventory delta to the dirstate for tree 0
1362
This is the workhorse for apply_inventory_delta in dirstate based
1365
:param delta: An inventory delta. See Inventory.apply_delta for
1062
def update_entry(self, entry, abspath, stat_value=None):
1063
"""Update the entry based on what is actually on disk.
1065
:param entry: This is the dirblock entry for the file in question.
1066
:param abspath: The path on disk for this file.
1067
:param stat_value: (optional) if we already have done a stat on the
1069
:return: The sha1 hexdigest of the file (40 bytes) or link target of a
1368
self._read_dirblocks_if_needed()
1369
encode = cache_utf8.encode
1372
# Accumulate parent references (path_utf8, id), to check for parentless
1373
# items or items placed under files/links/tree-references. We get
1374
# references from every item in the delta that is not a deletion and
1375
# is not itself the root.
1377
# Added ids must not be in the dirstate already. This set holds those
1380
# This loop transforms the delta to single atomic operations that can
1381
# be executed and validated.
1382
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1383
for old_path, new_path, file_id, inv_entry in delta:
1384
if (file_id in insertions) or (file_id in removals):
1385
self._raise_invalid(old_path or new_path, file_id,
1387
if old_path is not None:
1388
old_path = old_path.encode('utf-8')
1389
removals[file_id] = old_path
1391
new_ids.add(file_id)
1392
if new_path is not None:
1393
if inv_entry is None:
1394
self._raise_invalid(new_path, file_id,
1395
"new_path with no entry")
1396
new_path = new_path.encode('utf-8')
1397
dirname_utf8, basename = osutils.split(new_path)
1399
parents.add((dirname_utf8, inv_entry.parent_id))
1400
key = (dirname_utf8, basename, file_id)
1401
minikind = DirState._kind_to_minikind[inv_entry.kind]
1403
fingerprint = inv_entry.reference_revision or ''
1406
insertions[file_id] = (key, minikind, inv_entry.executable,
1407
fingerprint, new_path)
1408
# Transform moves into delete+add pairs
1409
if None not in (old_path, new_path):
1410
for child in self._iter_child_entries(0, old_path):
1411
if child[0][2] in insertions or child[0][2] in removals:
1413
child_dirname = child[0][0]
1414
child_basename = child[0][1]
1415
minikind = child[1][0][0]
1416
fingerprint = child[1][0][4]
1417
executable = child[1][0][3]
1418
old_child_path = osutils.pathjoin(child_dirname,
1420
removals[child[0][2]] = old_child_path
1421
child_suffix = child_dirname[len(old_path):]
1422
new_child_dirname = (new_path + child_suffix)
1423
key = (new_child_dirname, child_basename, child[0][2])
1424
new_child_path = osutils.pathjoin(new_child_dirname,
1426
insertions[child[0][2]] = (key, minikind, executable,
1427
fingerprint, new_child_path)
1428
self._check_delta_ids_absent(new_ids, delta, 0)
1430
self._apply_removals(removals.iteritems())
1431
self._apply_insertions(insertions.values())
1433
self._after_delta_check_parents(parents, 0)
1434
except errors.BzrError, e:
1435
self._changes_aborted = True
1436
if 'integrity error' not in str(e):
1438
# _get_entry raises BzrError when a request is inconsistent; we
1439
# want such errors to be shown as InconsistentDelta - and that
1440
# fits the behaviour we trigger.
1441
raise errors.InconsistentDeltaDelta(delta,
1442
"error from _get_entry. %s" % (e,))
1444
def _apply_removals(self, removals):
1445
for file_id, path in sorted(removals, reverse=True,
1446
key=operator.itemgetter(1)):
1447
dirname, basename = osutils.split(path)
1448
block_i, entry_i, d_present, f_present = \
1449
self._get_block_entry_index(dirname, basename, 0)
1072
# This code assumes that the entry passed in is directly held in one of
1073
# the internal _dirblocks. So the dirblock state must have already been
1075
assert self._dirblock_state != DirState.NOT_IN_MEMORY
1076
if stat_value is None:
1451
entry = self._dirblocks[block_i][1][entry_i]
1453
self._raise_invalid(path, file_id,
1454
"Wrong path for old path.")
1455
if not f_present or entry[1][0][0] in 'ar':
1456
self._raise_invalid(path, file_id,
1457
"Wrong path for old path.")
1458
if file_id != entry[0][2]:
1459
self._raise_invalid(path, file_id,
1460
"Attempt to remove path has wrong id - found %r."
1462
self._make_absent(entry)
1463
# See if we have a malformed delta: deleting a directory must not
1464
# leave crud behind. This increases the number of bisects needed
1465
# substantially, but deletion or renames of large numbers of paths
1466
# is rare enough it shouldn't be an issue (famous last words?) RBC
1468
block_i, entry_i, d_present, f_present = \
1469
self._get_block_entry_index(path, '', 0)
1471
# The dir block is still present in the dirstate; this could
1472
# be due to it being in a parent tree, or a corrupt delta.
1473
for child_entry in self._dirblocks[block_i][1]:
1474
if child_entry[1][0][0] not in ('r', 'a'):
1475
self._raise_invalid(path, entry[0][2],
1476
"The file id was deleted but its children were "
1479
def _apply_insertions(self, adds):
1481
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1482
self.update_minimal(key, minikind, executable, fingerprint,
1483
path_utf8=path_utf8)
1484
except errors.NotVersionedError:
1485
self._raise_invalid(path_utf8.decode('utf8'), key[2],
1488
def update_basis_by_delta(self, delta, new_revid):
1489
"""Update the parents of this tree after a commit.
1491
This gives the tree one parent, with revision id new_revid. The
1492
inventory delta is applied to the current basis tree to generate the
1493
inventory for the parent new_revid, and all other parent trees are
1496
Note that an exception during the operation of this method will leave
1497
the dirstate in a corrupt state where it should not be saved.
1499
:param new_revid: The new revision id for the trees parent.
1500
:param delta: An inventory delta (see apply_inventory_delta) describing
1501
the changes from the current left most parent revision to new_revid.
1503
self._read_dirblocks_if_needed()
1504
self._discard_merge_parents()
1505
if self._ghosts != []:
1506
raise NotImplementedError(self.update_basis_by_delta)
1507
if len(self._parents) == 0:
1508
# setup a blank tree, the most simple way.
1509
empty_parent = DirState.NULL_PARENT_DETAILS
1510
for entry in self._iter_entries():
1511
entry[1].append(empty_parent)
1512
self._parents.append(new_revid)
1514
self._parents[0] = new_revid
1516
delta = sorted(self._check_delta_is_valid(delta), reverse=True)
1520
# The paths this function accepts are unicode and must be encoded as we
1522
encode = cache_utf8.encode
1523
inv_to_entry = self._inv_entry_to_details
1524
# delta is now (deletes, changes), (adds) in reverse lexographical
1526
# deletes in reverse lexographic order are safe to process in situ.
1527
# renames are not, as a rename from any path could go to a path
1528
# lexographically lower, so we transform renames into delete, add pairs,
1529
# expanding them recursively as needed.
1530
# At the same time, to reduce interface friction we convert the input
1531
# inventory entries to dirstate.
1532
root_only = ('', '')
1533
# Accumulate parent references (path_utf8, id), to check for parentless
1534
# items or items placed under files/links/tree-references. We get
1535
# references from every item in the delta that is not a deletion and
1536
# is not itself the root.
1538
# Added ids must not be in the dirstate already. This set holds those
1541
for old_path, new_path, file_id, inv_entry in delta:
1542
if inv_entry is not None and file_id != inv_entry.file_id:
1543
self._raise_invalid(new_path, file_id,
1544
"mismatched entry file_id %r" % inv_entry)
1545
if new_path is None:
1546
new_path_utf8 = None
1548
if inv_entry is None:
1549
self._raise_invalid(new_path, file_id,
1550
"new_path with no entry")
1551
new_path_utf8 = encode(new_path)
1552
# note the parent for validation
1553
dirname_utf8, basename_utf8 = osutils.split(new_path_utf8)
1555
parents.add((dirname_utf8, inv_entry.parent_id))
1556
if old_path is None:
1557
old_path_utf8 = None
1559
old_path_utf8 = encode(old_path)
1560
if old_path is None:
1561
adds.append((None, new_path_utf8, file_id,
1562
inv_to_entry(inv_entry), True))
1563
new_ids.add(file_id)
1564
elif new_path is None:
1565
deletes.append((old_path_utf8, None, file_id, None, True))
1566
elif (old_path, new_path) == root_only:
1567
# change things in-place
1568
# Note: the case of a parent directory changing its file_id
1569
# tends to break optimizations here, because officially
1570
# the file has actually been moved, it just happens to
1571
# end up at the same path. If we can figure out how to
1572
# handle that case, we can avoid a lot of add+delete
1573
# pairs for objects that stay put.
1574
# elif old_path == new_path:
1575
changes.append((old_path_utf8, new_path_utf8, file_id,
1576
inv_to_entry(inv_entry)))
1579
# Because renames must preserve their children we must have
1580
# processed all relocations and removes before hand. The sort
1581
# order ensures we've examined the child paths, but we also
1582
# have to execute the removals, or the split to an add/delete
1583
# pair will result in the deleted item being reinserted, or
1584
# renamed items being reinserted twice - and possibly at the
1585
# wrong place. Splitting into a delete/add pair also simplifies
1586
# the handling of entries with ('f', ...), ('r' ...) because
1587
# the target of the 'r' is old_path here, and we add that to
1588
# deletes, meaning that the add handler does not need to check
1589
# for 'r' items on every pass.
1590
self._update_basis_apply_deletes(deletes)
1592
# Split into an add/delete pair recursively.
1593
adds.append((old_path_utf8, new_path_utf8, file_id,
1594
inv_to_entry(inv_entry), False))
1595
# Expunge deletes that we've seen so that deleted/renamed
1596
# children of a rename directory are handled correctly.
1597
new_deletes = reversed(list(
1598
self._iter_child_entries(1, old_path_utf8)))
1599
# Remove the current contents of the tree at orig_path, and
1600
# reinsert at the correct new path.
1601
for entry in new_deletes:
1602
child_dirname, child_basename, child_file_id = entry[0]
1604
source_path = child_dirname + '/' + child_basename
1606
source_path = child_basename
1608
target_path = new_path_utf8 + source_path[len(old_path):]
1611
raise AssertionError("cannot rename directory to"
1613
target_path = source_path[len(old_path) + 1:]
1614
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1616
(source_path, target_path, entry[0][2], None, False))
1617
deletes.append((old_path_utf8, new_path, file_id, None, False))
1618
self._check_delta_ids_absent(new_ids, delta, 1)
1620
# Finish expunging deletes/first half of renames.
1621
self._update_basis_apply_deletes(deletes)
1622
# Reinstate second half of renames and new paths.
1623
self._update_basis_apply_adds(adds)
1624
# Apply in-situ changes.
1625
self._update_basis_apply_changes(changes)
1627
self._after_delta_check_parents(parents, 1)
1628
except errors.BzrError, e:
1629
self._changes_aborted = True
1630
if 'integrity error' not in str(e):
1078
# We could inline os.lstat but the common case is that
1079
# stat_value will be passed in, not read here.
1080
stat_value = self._lstat(abspath, entry)
1081
except (OSError, IOError), e:
1082
if e.errno in (errno.ENOENT, errno.EACCES,
1084
# The entry is missing, consider it gone
1632
# _get_entry raises BzrError when a request is inconsistent; we
1633
# want such errors to be shown as InconsistentDelta - and that
1634
# fits the behaviour we trigger.
1635
raise errors.InconsistentDeltaDelta(delta,
1636
"error from _get_entry. %s" % (e,))
1638
self._mark_modified(header_modified=True)
1639
self._id_index = None
1642
def _check_delta_ids_absent(self, new_ids, delta, tree_index):
1643
"""Check that none of the file_ids in new_ids are present in a tree."""
1646
id_index = self._get_id_index()
1647
for file_id in new_ids:
1648
for key in id_index.get(file_id, ()):
1649
block_i, entry_i, d_present, f_present = \
1650
self._get_block_entry_index(key[0], key[1], tree_index)
1652
# In a different tree
1654
entry = self._dirblocks[block_i][1][entry_i]
1655
if entry[0][2] != file_id:
1656
# Different file_id, so not what we want.
1658
self._raise_invalid(("%s/%s" % key[0:2]).decode('utf8'), file_id,
1659
"This file_id is new in the delta but already present in "
1662
def _raise_invalid(self, path, file_id, reason):
1663
self._changes_aborted = True
1664
raise errors.InconsistentDelta(path, file_id, reason)
1666
def _update_basis_apply_adds(self, adds):
1667
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1669
They may be adds, or renames that have been split into add/delete
1672
:param adds: A sequence of adds. Each add is a tuple:
1673
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1674
is False when the add is the second half of a remove-and-reinsert
1675
pair created to handle renames and deletes.
1677
# Adds are accumulated partly from renames, so can be in any input
1679
# TODO: we may want to sort in dirblocks order. That way each entry
1680
# will end up in the same directory, allowing the _get_entry
1681
# fast-path for looking up 2 items in the same dir work.
1682
adds.sort(key=lambda x: x[1])
1683
# adds is now in lexographic order, which places all parents before
1684
# their children, so we can process it linearly.
1686
st = static_tuple.StaticTuple
1687
for old_path, new_path, file_id, new_details, real_add in adds:
1688
dirname, basename = osutils.split(new_path)
1689
entry_key = st(dirname, basename, file_id)
1690
block_index, present = self._find_block_index_from_key(entry_key)
1692
self._raise_invalid(new_path, file_id,
1693
"Unable to find block for this record."
1694
" Was the parent added?")
1695
block = self._dirblocks[block_index][1]
1696
entry_index, present = self._find_entry_index(entry_key, block)
1698
if old_path is not None:
1699
self._raise_invalid(new_path, file_id,
1700
'considered a real add but still had old_path at %s'
1703
entry = block[entry_index]
1704
basis_kind = entry[1][1][0]
1705
if basis_kind == 'a':
1706
entry[1][1] = new_details
1707
elif basis_kind == 'r':
1708
raise NotImplementedError()
1710
self._raise_invalid(new_path, file_id,
1711
"An entry was marked as a new add"
1712
" but the basis target already existed")
1714
# The exact key was not found in the block. However, we need to
1715
# check if there is a key next to us that would have matched.
1716
# We only need to check 2 locations, because there are only 2
1718
for maybe_index in range(entry_index-1, entry_index+1):
1719
if maybe_index < 0 or maybe_index >= len(block):
1721
maybe_entry = block[maybe_index]
1722
if maybe_entry[0][:2] != (dirname, basename):
1723
# Just a random neighbor
1725
if maybe_entry[0][2] == file_id:
1726
raise AssertionError(
1727
'_find_entry_index didnt find a key match'
1728
' but walking the data did, for %s'
1730
basis_kind = maybe_entry[1][1][0]
1731
if basis_kind not in 'ar':
1732
self._raise_invalid(new_path, file_id,
1733
"we have an add record for path, but the path"
1734
" is already present with another file_id %s"
1735
% (maybe_entry[0][2],))
1737
entry = (entry_key, [DirState.NULL_PARENT_DETAILS,
1739
block.insert(entry_index, entry)
1741
active_kind = entry[1][0][0]
1742
if active_kind == 'a':
1743
# The active record shows up as absent, this could be genuine,
1744
# or it could be present at some other location. We need to
1746
id_index = self._get_id_index()
1747
# The id_index may not be perfectly accurate for tree1, because
1748
# we haven't been keeping it updated. However, it should be
1749
# fine for tree0, and that gives us enough info for what we
1751
keys = id_index.get(file_id, ())
1753
block_i, entry_i, d_present, f_present = \
1754
self._get_block_entry_index(key[0], key[1], 0)
1757
active_entry = self._dirblocks[block_i][1][entry_i]
1758
if (active_entry[0][2] != file_id):
1759
# Some other file is at this path, we don't need to
1762
real_active_kind = active_entry[1][0][0]
1763
if real_active_kind in 'ar':
1764
# We found a record, which was not *this* record,
1765
# which matches the file_id, but is not actually
1766
# present. Something seems *really* wrong.
1767
self._raise_invalid(new_path, file_id,
1768
"We found a tree0 entry that doesnt make sense")
1769
# Now, we've found a tree0 entry which matches the file_id
1770
# but is at a different location. So update them to be
1772
active_dir, active_name = active_entry[0][:2]
1774
active_path = active_dir + '/' + active_name
1776
active_path = active_name
1777
active_entry[1][1] = st('r', new_path, 0, False, '')
1778
entry[1][0] = st('r', active_path, 0, False, '')
1779
elif active_kind == 'r':
1780
raise NotImplementedError()
1782
new_kind = new_details[0]
1784
self._ensure_block(block_index, entry_index, new_path)
1786
def _update_basis_apply_changes(self, changes):
1787
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1789
:param adds: A sequence of changes. Each change is a tuple:
1790
(path_utf8, path_utf8, file_id, (entry_details))
1793
for old_path, new_path, file_id, new_details in changes:
1794
# the entry for this file_id must be in tree 0.
1795
entry = self._get_entry(1, file_id, new_path)
1796
if entry[0] is None or entry[1][1][0] in 'ar':
1797
self._raise_invalid(new_path, file_id,
1798
'changed entry considered not present')
1799
entry[1][1] = new_details
1801
def _update_basis_apply_deletes(self, deletes):
1802
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1804
They may be deletes, or renames that have been split into add/delete
1807
:param deletes: A sequence of deletes. Each delete is a tuple:
1808
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1809
real_delete is True when the desired outcome is an actual deletion
1810
rather than the rename handling logic temporarily deleting a path
1811
during the replacement of a parent.
1813
null = DirState.NULL_PARENT_DETAILS
1814
for old_path, new_path, file_id, _, real_delete in deletes:
1815
if real_delete != (new_path is None):
1816
self._raise_invalid(old_path, file_id, "bad delete delta")
1817
# the entry for this file_id must be in tree 1.
1818
dirname, basename = osutils.split(old_path)
1819
block_index, entry_index, dir_present, file_present = \
1820
self._get_block_entry_index(dirname, basename, 1)
1821
if not file_present:
1822
self._raise_invalid(old_path, file_id,
1823
'basis tree does not contain removed entry')
1824
entry = self._dirblocks[block_index][1][entry_index]
1825
# The state of the entry in the 'active' WT
1826
active_kind = entry[1][0][0]
1827
if entry[0][2] != file_id:
1828
self._raise_invalid(old_path, file_id,
1829
'mismatched file_id in tree 1')
1831
old_kind = entry[1][1][0]
1832
if active_kind in 'ar':
1833
# The active tree doesn't have this file_id.
1834
# The basis tree is changing this record. If this is a
1835
# rename, then we don't want the record here at all
1836
# anymore. If it is just an in-place change, we want the
1837
# record here, but we'll add it if we need to. So we just
1839
if active_kind == 'r':
1840
active_path = entry[1][0][1]
1841
active_entry = self._get_entry(0, file_id, active_path)
1842
if active_entry[1][1][0] != 'r':
1843
self._raise_invalid(old_path, file_id,
1844
"Dirstate did not have matching rename entries")
1845
elif active_entry[1][0][0] in 'ar':
1846
self._raise_invalid(old_path, file_id,
1847
"Dirstate had a rename pointing at an inactive"
1849
active_entry[1][1] = null
1850
del self._dirblocks[block_index][1][entry_index]
1852
# This was a directory, and the active tree says it
1853
# doesn't exist, and now the basis tree says it doesn't
1854
# exist. Remove its dirblock if present
1856
present) = self._find_block_index_from_key(
1859
dir_block = self._dirblocks[dir_block_index][1]
1861
# This entry is empty, go ahead and just remove it
1862
del self._dirblocks[dir_block_index]
1864
# There is still an active record, so just mark this
1867
block_i, entry_i, d_present, f_present = \
1868
self._get_block_entry_index(old_path, '', 1)
1870
dir_block = self._dirblocks[block_i][1]
1871
for child_entry in dir_block:
1872
child_basis_kind = child_entry[1][1][0]
1873
if child_basis_kind not in 'ar':
1874
self._raise_invalid(old_path, file_id,
1875
"The file id was deleted but its children were "
1878
def _after_delta_check_parents(self, parents, index):
1879
"""Check that parents required by the delta are all intact.
1881
:param parents: An iterable of (path_utf8, file_id) tuples which are
1882
required to be present in tree 'index' at path_utf8 with id file_id
1884
:param index: The column in the dirstate to check for parents in.
1886
for dirname_utf8, file_id in parents:
1887
# Get the entry - the ensures that file_id, dirname_utf8 exists and
1888
# has the right file id.
1889
entry = self._get_entry(index, file_id, dirname_utf8)
1890
if entry[1] is None:
1891
self._raise_invalid(dirname_utf8.decode('utf8'),
1892
file_id, "This parent is not present.")
1893
# Parents of things must be directories
1894
if entry[1][index][0] != 'd':
1895
self._raise_invalid(dirname_utf8.decode('utf8'),
1896
file_id, "This parent is not a directory.")
1898
def _observed_sha1(self, entry, sha1, stat_value,
1899
_stat_to_minikind=_stat_to_minikind, _pack_stat=pack_stat):
1900
"""Note the sha1 of a file.
1902
:param entry: The entry the sha1 is for.
1903
:param sha1: The observed sha1.
1904
:param stat_value: The os.lstat for the file.
1088
kind = osutils.file_kind_from_stat_mode(stat_value.st_mode)
1907
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
1090
minikind = DirState._kind_to_minikind[kind]
1091
except KeyError: # Unknown kind
1911
packed_stat = _pack_stat(stat_value)
1093
packed_stat = pack_stat(stat_value)
1094
(saved_minikind, saved_link_or_sha1, saved_file_size,
1095
saved_executable, saved_packed_stat) = entry[1][0]
1097
if (minikind == saved_minikind
1098
and packed_stat == saved_packed_stat
1099
# size should also be in packed_stat
1100
and saved_file_size == stat_value.st_size):
1101
# The stat hasn't changed since we saved, so we can potentially
1102
# re-use the saved sha hash.
1913
1106
if self._cutoff_time is None:
1914
1107
self._sha_cutoff_time()
1915
1109
if (stat_value.st_mtime < self._cutoff_time
1916
1110
and stat_value.st_ctime < self._cutoff_time):
1917
entry[1][0] = ('f', sha1, stat_value.st_size, entry[1][0][3],
1919
self._mark_modified([entry])
1111
# Return the existing fingerprint
1112
return saved_link_or_sha1
1114
# If we have gotten this far, that means that we need to actually
1115
# process this entry.
1118
link_or_sha1 = self._sha1_file(abspath, entry)
1119
executable = self._is_executable(stat_value.st_mode,
1121
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
1122
executable, packed_stat)
1123
elif minikind == 'd':
1125
entry[1][0] = ('d', '', 0, False, packed_stat)
1126
if saved_minikind != 'd':
1127
# This changed from something into a directory. Make sure we
1128
# have a directory block for it. This doesn't happen very
1129
# often, so this doesn't have to be super fast.
1130
block_index, entry_index, dir_present, file_present = \
1131
self._get_block_entry_index(entry[0][0], entry[0][1], 0)
1132
self._ensure_block(block_index, entry_index,
1133
osutils.pathjoin(entry[0][0], entry[0][1]))
1134
elif minikind == 'l':
1135
link_or_sha1 = self._read_link(abspath, saved_link_or_sha1)
1136
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
1138
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1921
1141
def _sha_cutoff_time(self):
1922
1142
"""Return cutoff time.
3391
2345
self._split_path_cache = {}
3393
2347
def _requires_lock(self):
3394
"""Check that a lock is currently held by someone on the dirstate."""
2348
"""Checks that a lock is currently held by someone on the dirstate"""
3395
2349
if not self._lock_token:
3396
2350
raise errors.ObjectNotLocked(self)
3399
def py_update_entry(state, entry, abspath, stat_value,
3400
_stat_to_minikind=DirState._stat_to_minikind,
3401
_pack_stat=pack_stat):
3402
"""Update the entry based on what is actually on disk.
3404
This function only calculates the sha if it needs to - if the entry is
3405
uncachable, or clearly different to the first parent's entry, no sha
3406
is calculated, and None is returned.
3408
:param state: The dirstate this entry is in.
3409
:param entry: This is the dirblock entry for the file in question.
3410
:param abspath: The path on disk for this file.
3411
:param stat_value: The stat value done on the path.
3412
:return: None, or The sha1 hexdigest of the file (40 bytes) or link
3413
target of a symlink.
2353
def bisect_dirblock(dirblocks, dirname, lo=0, hi=None, cache={}):
2354
"""Return the index where to insert dirname into the dirblocks.
2356
The return value idx is such that all directories blocks in dirblock[:idx]
2357
have names < dirname, and all blocks in dirblock[idx:] have names >=
2360
Optional args lo (default 0) and hi (default len(dirblocks)) bound the
2361
slice of a to be searched.
3416
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
2366
dirname_split = cache[dirname]
3417
2367
except KeyError:
3420
packed_stat = _pack_stat(stat_value)
3421
(saved_minikind, saved_link_or_sha1, saved_file_size,
3422
saved_executable, saved_packed_stat) = entry[1][0]
3424
if minikind == 'd' and saved_minikind == 't':
3426
if (minikind == saved_minikind
3427
and packed_stat == saved_packed_stat):
3428
# The stat hasn't changed since we saved, so we can re-use the
3433
# size should also be in packed_stat
3434
if saved_file_size == stat_value.st_size:
3435
return saved_link_or_sha1
3437
# If we have gotten this far, that means that we need to actually
3438
# process this entry.
3442
executable = state._is_executable(stat_value.st_mode,
3444
if state._cutoff_time is None:
3445
state._sha_cutoff_time()
3446
if (stat_value.st_mtime < state._cutoff_time
3447
and stat_value.st_ctime < state._cutoff_time
3448
and len(entry[1]) > 1
3449
and entry[1][1][0] != 'a'):
3450
# Could check for size changes for further optimised
3451
# avoidance of sha1's. However the most prominent case of
3452
# over-shaing is during initial add, which this catches.
3453
# Besides, if content filtering happens, size and sha
3454
# are calculated at the same time, so checking just the size
3455
# gains nothing w.r.t. performance.
3456
link_or_sha1 = state._sha1_file(abspath)
3457
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
3458
executable, packed_stat)
3460
entry[1][0] = ('f', '', stat_value.st_size,
3461
executable, DirState.NULLSTAT)
3462
worth_saving = False
3463
elif minikind == 'd':
3465
entry[1][0] = ('d', '', 0, False, packed_stat)
3466
if saved_minikind != 'd':
3467
# This changed from something into a directory. Make sure we
3468
# have a directory block for it. This doesn't happen very
3469
# often, so this doesn't have to be super fast.
3470
block_index, entry_index, dir_present, file_present = \
3471
state._get_block_entry_index(entry[0][0], entry[0][1], 0)
3472
state._ensure_block(block_index, entry_index,
3473
osutils.pathjoin(entry[0][0], entry[0][1]))
3475
worth_saving = False
3476
elif minikind == 'l':
3477
if saved_minikind == 'l':
3478
worth_saving = False
3479
link_or_sha1 = state._read_link(abspath, saved_link_or_sha1)
3480
if state._cutoff_time is None:
3481
state._sha_cutoff_time()
3482
if (stat_value.st_mtime < state._cutoff_time
3483
and stat_value.st_ctime < state._cutoff_time):
3484
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
3487
entry[1][0] = ('l', '', stat_value.st_size,
3488
False, DirState.NULLSTAT)
3490
state._mark_modified([entry])
3494
class ProcessEntryPython(object):
3496
__slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id",
3497
"last_source_parent", "last_target_parent", "include_unchanged",
3498
"partial", "use_filesystem_for_exec", "utf8_decode",
3499
"searched_specific_files", "search_specific_files",
3500
"searched_exact_paths", "search_specific_file_parents", "seen_ids",
3501
"state", "source_index", "target_index", "want_unversioned", "tree"]
3503
def __init__(self, include_unchanged, use_filesystem_for_exec,
3504
search_specific_files, state, source_index, target_index,
3505
want_unversioned, tree):
3506
self.old_dirname_to_file_id = {}
3507
self.new_dirname_to_file_id = {}
3508
# Are we doing a partial iter_changes?
3509
self.partial = search_specific_files != set([''])
3510
# Using a list so that we can access the values and change them in
3511
# nested scope. Each one is [path, file_id, entry]
3512
self.last_source_parent = [None, None]
3513
self.last_target_parent = [None, None]
3514
self.include_unchanged = include_unchanged
3515
self.use_filesystem_for_exec = use_filesystem_for_exec
3516
self.utf8_decode = cache_utf8._utf8_decode
3517
# for all search_indexs in each path at or under each element of
3518
# search_specific_files, if the detail is relocated: add the id, and
3519
# add the relocated path as one to search if its not searched already.
3520
# If the detail is not relocated, add the id.
3521
self.searched_specific_files = set()
3522
# When we search exact paths without expanding downwards, we record
3524
self.searched_exact_paths = set()
3525
self.search_specific_files = search_specific_files
3526
# The parents up to the root of the paths we are searching.
3527
# After all normal paths are returned, these specific items are returned.
3528
self.search_specific_file_parents = set()
3529
# The ids we've sent out in the delta.
3530
self.seen_ids = set()
3532
self.source_index = source_index
3533
self.target_index = target_index
3534
if target_index != 0:
3535
# A lot of code in here depends on target_index == 0
3536
raise errors.BzrError('unsupported target index')
3537
self.want_unversioned = want_unversioned
3540
def _process_entry(self, entry, path_info, pathjoin=osutils.pathjoin):
3541
"""Compare an entry and real disk to generate delta information.
3543
:param path_info: top_relpath, basename, kind, lstat, abspath for
3544
the path of entry. If None, then the path is considered absent in
3545
the target (Perhaps we should pass in a concrete entry for this ?)
3546
Basename is returned as a utf8 string because we expect this
3547
tuple will be ignored, and don't want to take the time to
3549
:return: (iter_changes_result, changed). If the entry has not been
3550
handled then changed is None. Otherwise it is False if no content
3551
or metadata changes have occurred, and True if any content or
3552
metadata change has occurred. If self.include_unchanged is True then
3553
if changed is not None, iter_changes_result will always be a result
3554
tuple. Otherwise, iter_changes_result is None unless changed is
3557
if self.source_index is None:
3558
source_details = DirState.NULL_PARENT_DETAILS
3560
source_details = entry[1][self.source_index]
3561
target_details = entry[1][self.target_index]
3562
target_minikind = target_details[0]
3563
if path_info is not None and target_minikind in 'fdlt':
3564
if not (self.target_index == 0):
3565
raise AssertionError()
3566
link_or_sha1 = update_entry(self.state, entry,
3567
abspath=path_info[4], stat_value=path_info[3])
3568
# The entry may have been modified by update_entry
3569
target_details = entry[1][self.target_index]
3570
target_minikind = target_details[0]
3573
file_id = entry[0][2]
3574
source_minikind = source_details[0]
3575
if source_minikind in 'fdltr' and target_minikind in 'fdlt':
3576
# claimed content in both: diff
3577
# r | fdlt | | add source to search, add id path move and perform
3578
# | | | diff check on source-target
3579
# r | fdlt | a | dangling file that was present in the basis.
3581
if source_minikind in 'r':
3582
# add the source to the search path to find any children it
3583
# has. TODO ? : only add if it is a container ?
3584
if not osutils.is_inside_any(self.searched_specific_files,
3586
self.search_specific_files.add(source_details[1])
3587
# generate the old path; this is needed for stating later
3589
old_path = source_details[1]
3590
old_dirname, old_basename = os.path.split(old_path)
3591
path = pathjoin(entry[0][0], entry[0][1])
3592
old_entry = self.state._get_entry(self.source_index,
3594
# update the source details variable to be the real
3596
if old_entry == (None, None):
3597
raise errors.CorruptDirstate(self.state._filename,
3598
"entry '%s/%s' is considered renamed from %r"
3599
" but source does not exist\n"
3600
"entry: %s" % (entry[0][0], entry[0][1], old_path, entry))
3601
source_details = old_entry[1][self.source_index]
3602
source_minikind = source_details[0]
3604
old_dirname = entry[0][0]
3605
old_basename = entry[0][1]
3606
old_path = path = None
3607
if path_info is None:
3608
# the file is missing on disk, show as removed.
3609
content_change = True
3613
# source and target are both versioned and disk file is present.
3614
target_kind = path_info[2]
3615
if target_kind == 'directory':
3617
old_path = path = pathjoin(old_dirname, old_basename)
3618
self.new_dirname_to_file_id[path] = file_id
3619
if source_minikind != 'd':
3620
content_change = True
3622
# directories have no fingerprint
3623
content_change = False
3625
elif target_kind == 'file':
3626
if source_minikind != 'f':
3627
content_change = True
3629
# Check the sha. We can't just rely on the size as
3630
# content filtering may mean differ sizes actually
3631
# map to the same content
3632
if link_or_sha1 is None:
3634
statvalue, link_or_sha1 = \
3635
self.state._sha1_provider.stat_and_sha1(
3637
self.state._observed_sha1(entry, link_or_sha1,
3639
content_change = (link_or_sha1 != source_details[1])
3640
# Target details is updated at update_entry time
3641
if self.use_filesystem_for_exec:
3642
# We don't need S_ISREG here, because we are sure
3643
# we are dealing with a file.
3644
target_exec = bool(stat.S_IEXEC & path_info[3].st_mode)
3646
target_exec = target_details[3]
3647
elif target_kind == 'symlink':
3648
if source_minikind != 'l':
3649
content_change = True
3651
content_change = (link_or_sha1 != source_details[1])
3653
elif target_kind == 'tree-reference':
3654
if source_minikind != 't':
3655
content_change = True
3657
content_change = False
3661
path = pathjoin(old_dirname, old_basename)
3662
raise errors.BadFileKindError(path, path_info[2])
3663
if source_minikind == 'd':
3665
old_path = path = pathjoin(old_dirname, old_basename)
3666
self.old_dirname_to_file_id[old_path] = file_id
3667
# parent id is the entry for the path in the target tree
3668
if old_basename and old_dirname == self.last_source_parent[0]:
3669
source_parent_id = self.last_source_parent[1]
3672
source_parent_id = self.old_dirname_to_file_id[old_dirname]
3674
source_parent_entry = self.state._get_entry(self.source_index,
3675
path_utf8=old_dirname)
3676
source_parent_id = source_parent_entry[0][2]
3677
if source_parent_id == entry[0][2]:
3678
# This is the root, so the parent is None
3679
source_parent_id = None
3681
self.last_source_parent[0] = old_dirname
3682
self.last_source_parent[1] = source_parent_id
3683
new_dirname = entry[0][0]
3684
if entry[0][1] and new_dirname == self.last_target_parent[0]:
3685
target_parent_id = self.last_target_parent[1]
3688
target_parent_id = self.new_dirname_to_file_id[new_dirname]
3690
# TODO: We don't always need to do the lookup, because the
3691
# parent entry will be the same as the source entry.
3692
target_parent_entry = self.state._get_entry(self.target_index,
3693
path_utf8=new_dirname)
3694
if target_parent_entry == (None, None):
3695
raise AssertionError(
3696
"Could not find target parent in wt: %s\nparent of: %s"
3697
% (new_dirname, entry))
3698
target_parent_id = target_parent_entry[0][2]
3699
if target_parent_id == entry[0][2]:
3700
# This is the root, so the parent is None
3701
target_parent_id = None
3703
self.last_target_parent[0] = new_dirname
3704
self.last_target_parent[1] = target_parent_id
3706
source_exec = source_details[3]
3707
changed = (content_change
3708
or source_parent_id != target_parent_id
3709
or old_basename != entry[0][1]
3710
or source_exec != target_exec
3712
if not changed and not self.include_unchanged:
3715
if old_path is None:
3716
old_path = path = pathjoin(old_dirname, old_basename)
3717
old_path_u = self.utf8_decode(old_path)[0]
3720
old_path_u = self.utf8_decode(old_path)[0]
3721
if old_path == path:
3724
path_u = self.utf8_decode(path)[0]
3725
source_kind = DirState._minikind_to_kind[source_minikind]
3726
return (entry[0][2],
3727
(old_path_u, path_u),
3730
(source_parent_id, target_parent_id),
3731
(self.utf8_decode(old_basename)[0], self.utf8_decode(entry[0][1])[0]),
3732
(source_kind, target_kind),
3733
(source_exec, target_exec)), changed
3734
elif source_minikind in 'a' and target_minikind in 'fdlt':
3735
# looks like a new file
3736
path = pathjoin(entry[0][0], entry[0][1])
3737
# parent id is the entry for the path in the target tree
3738
# TODO: these are the same for an entire directory: cache em.
3739
parent_id = self.state._get_entry(self.target_index,
3740
path_utf8=entry[0][0])[0][2]
3741
if parent_id == entry[0][2]:
3743
if path_info is not None:
3745
if self.use_filesystem_for_exec:
3746
# We need S_ISREG here, because we aren't sure if this
3749
stat.S_ISREG(path_info[3].st_mode)
3750
and stat.S_IEXEC & path_info[3].st_mode)
3752
target_exec = target_details[3]
3753
return (entry[0][2],
3754
(None, self.utf8_decode(path)[0]),
3758
(None, self.utf8_decode(entry[0][1])[0]),
3759
(None, path_info[2]),
3760
(None, target_exec)), True
3762
# Its a missing file, report it as such.
3763
return (entry[0][2],
3764
(None, self.utf8_decode(path)[0]),
3768
(None, self.utf8_decode(entry[0][1])[0]),
3770
(None, False)), True
3771
elif source_minikind in 'fdlt' and target_minikind in 'a':
3772
# unversioned, possibly, or possibly not deleted: we dont care.
3773
# if its still on disk, *and* theres no other entry at this
3774
# path [we dont know this in this routine at the moment -
3775
# perhaps we should change this - then it would be an unknown.
3776
old_path = pathjoin(entry[0][0], entry[0][1])
3777
# parent id is the entry for the path in the target tree
3778
parent_id = self.state._get_entry(self.source_index, path_utf8=entry[0][0])[0][2]
3779
if parent_id == entry[0][2]:
3781
return (entry[0][2],
3782
(self.utf8_decode(old_path)[0], None),
3786
(self.utf8_decode(entry[0][1])[0], None),
3787
(DirState._minikind_to_kind[source_minikind], None),
3788
(source_details[3], None)), True
3789
elif source_minikind in 'fdlt' and target_minikind in 'r':
3790
# a rename; could be a true rename, or a rename inherited from
3791
# a renamed parent. TODO: handle this efficiently. Its not
3792
# common case to rename dirs though, so a correct but slow
3793
# implementation will do.
3794
if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
3795
self.search_specific_files.add(target_details[1])
3796
elif source_minikind in 'ra' and target_minikind in 'ra':
3797
# neither of the selected trees contain this file,
3798
# so skip over it. This is not currently directly tested, but
3799
# is indirectly via test_too_much.TestCommands.test_conflicts.
3802
raise AssertionError("don't know how to compare "
3803
"source_minikind=%r, target_minikind=%r"
3804
% (source_minikind, target_minikind))
3810
def _gather_result_for_consistency(self, result):
3811
"""Check a result we will yield to make sure we are consistent later.
3813
This gathers result's parents into a set to output later.
3815
:param result: A result tuple.
3817
if not self.partial or not result[0]:
3819
self.seen_ids.add(result[0])
3820
new_path = result[1][1]
3822
# Not the root and not a delete: queue up the parents of the path.
3823
self.search_specific_file_parents.update(
3824
osutils.parent_directories(new_path.encode('utf8')))
3825
# Add the root directory which parent_directories does not
3827
self.search_specific_file_parents.add('')
3829
def iter_changes(self):
3830
"""Iterate over the changes."""
3831
utf8_decode = cache_utf8._utf8_decode
3832
_cmp_by_dirs = cmp_by_dirs
3833
_process_entry = self._process_entry
3834
search_specific_files = self.search_specific_files
3835
searched_specific_files = self.searched_specific_files
3836
splitpath = osutils.splitpath
3838
# compare source_index and target_index at or under each element of search_specific_files.
3839
# follow the following comparison table. Note that we only want to do diff operations when
3840
# the target is fdl because thats when the walkdirs logic will have exposed the pathinfo
3844
# Source | Target | disk | action
3845
# r | fdlt | | add source to search, add id path move and perform
3846
# | | | diff check on source-target
3847
# r | fdlt | a | dangling file that was present in the basis.
3849
# r | a | | add source to search
3851
# r | r | | this path is present in a non-examined tree, skip.
3852
# r | r | a | this path is present in a non-examined tree, skip.
3853
# a | fdlt | | add new id
3854
# a | fdlt | a | dangling locally added file, skip
3855
# a | a | | not present in either tree, skip
3856
# a | a | a | not present in any tree, skip
3857
# a | r | | not present in either tree at this path, skip as it
3858
# | | | may not be selected by the users list of paths.
3859
# a | r | a | not present in either tree at this path, skip as it
3860
# | | | may not be selected by the users list of paths.
3861
# fdlt | fdlt | | content in both: diff them
3862
# fdlt | fdlt | a | deleted locally, but not unversioned - show as deleted ?
3863
# fdlt | a | | unversioned: output deleted id for now
3864
# fdlt | a | a | unversioned and deleted: output deleted id
3865
# fdlt | r | | relocated in this tree, so add target to search.
3866
# | | | Dont diff, we will see an r,fd; pair when we reach
3867
# | | | this id at the other path.
3868
# fdlt | r | a | relocated in this tree, so add target to search.
3869
# | | | Dont diff, we will see an r,fd; pair when we reach
3870
# | | | this id at the other path.
3872
# TODO: jam 20070516 - Avoid the _get_entry lookup overhead by
3873
# keeping a cache of directories that we have seen.
3875
while search_specific_files:
3876
# TODO: the pending list should be lexically sorted? the
3877
# interface doesn't require it.
3878
current_root = search_specific_files.pop()
3879
current_root_unicode = current_root.decode('utf8')
3880
searched_specific_files.add(current_root)
3881
# process the entries for this containing directory: the rest will be
3882
# found by their parents recursively.
3883
root_entries = self.state._entries_for_path(current_root)
3884
root_abspath = self.tree.abspath(current_root_unicode)
3886
root_stat = os.lstat(root_abspath)
3888
if e.errno == errno.ENOENT:
3889
# the path does not exist: let _process_entry know that.
3890
root_dir_info = None
3892
# some other random error: hand it up.
3895
root_dir_info = ('', current_root,
3896
osutils.file_kind_from_stat_mode(root_stat.st_mode), root_stat,
3898
if root_dir_info[2] == 'directory':
3899
if self.tree._directory_is_tree_reference(
3900
current_root.decode('utf8')):
3901
root_dir_info = root_dir_info[:2] + \
3902
('tree-reference',) + root_dir_info[3:]
3904
if not root_entries and not root_dir_info:
3905
# this specified path is not present at all, skip it.
3907
path_handled = False
3908
for entry in root_entries:
3909
result, changed = _process_entry(entry, root_dir_info)
3910
if changed is not None:
3913
self._gather_result_for_consistency(result)
3914
if changed or self.include_unchanged:
3916
if self.want_unversioned and not path_handled and root_dir_info:
3917
new_executable = bool(
3918
stat.S_ISREG(root_dir_info[3].st_mode)
3919
and stat.S_IEXEC & root_dir_info[3].st_mode)
3921
(None, current_root_unicode),
3925
(None, splitpath(current_root_unicode)[-1]),
3926
(None, root_dir_info[2]),
3927
(None, new_executable)
3929
initial_key = (current_root, '', '')
3930
block_index, _ = self.state._find_block_index_from_key(initial_key)
3931
if block_index == 0:
3932
# we have processed the total root already, but because the
3933
# initial key matched it we should skip it here.
3935
if root_dir_info and root_dir_info[2] == 'tree-reference':
3936
current_dir_info = None
3938
dir_iterator = osutils._walkdirs_utf8(root_abspath, prefix=current_root)
3940
current_dir_info = dir_iterator.next()
3942
# on win32, python2.4 has e.errno == ERROR_DIRECTORY, but
3943
# python 2.5 has e.errno == EINVAL,
3944
# and e.winerror == ERROR_DIRECTORY
3945
e_winerror = getattr(e, 'winerror', None)
3946
win_errors = (ERROR_DIRECTORY, ERROR_PATH_NOT_FOUND)
3947
# there may be directories in the inventory even though
3948
# this path is not a file on disk: so mark it as end of
3950
if e.errno in (errno.ENOENT, errno.ENOTDIR, errno.EINVAL):
3951
current_dir_info = None
3952
elif (sys.platform == 'win32'
3953
and (e.errno in win_errors
3954
or e_winerror in win_errors)):
3955
current_dir_info = None
3959
if current_dir_info[0][0] == '':
3960
# remove .bzr from iteration
3961
bzr_index = bisect.bisect_left(current_dir_info[1], ('.bzr',))
3962
if current_dir_info[1][bzr_index][0] != '.bzr':
3963
raise AssertionError()
3964
del current_dir_info[1][bzr_index]
3965
# walk until both the directory listing and the versioned metadata
3967
if (block_index < len(self.state._dirblocks) and
3968
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3969
current_block = self.state._dirblocks[block_index]
3971
current_block = None
3972
while (current_dir_info is not None or
3973
current_block is not None):
3974
if (current_dir_info and current_block
3975
and current_dir_info[0][0] != current_block[0]):
3976
if _cmp_by_dirs(current_dir_info[0][0], current_block[0]) < 0:
3977
# filesystem data refers to paths not covered by the dirblock.
3978
# this has two possibilities:
3979
# A) it is versioned but empty, so there is no block for it
3980
# B) it is not versioned.
3982
# if (A) then we need to recurse into it to check for
3983
# new unknown files or directories.
3984
# if (B) then we should ignore it, because we don't
3985
# recurse into unknown directories.
3987
while path_index < len(current_dir_info[1]):
3988
current_path_info = current_dir_info[1][path_index]
3989
if self.want_unversioned:
3990
if current_path_info[2] == 'directory':
3991
if self.tree._directory_is_tree_reference(
3992
current_path_info[0].decode('utf8')):
3993
current_path_info = current_path_info[:2] + \
3994
('tree-reference',) + current_path_info[3:]
3995
new_executable = bool(
3996
stat.S_ISREG(current_path_info[3].st_mode)
3997
and stat.S_IEXEC & current_path_info[3].st_mode)
3999
(None, utf8_decode(current_path_info[0])[0]),
4003
(None, utf8_decode(current_path_info[1])[0]),
4004
(None, current_path_info[2]),
4005
(None, new_executable))
4006
# dont descend into this unversioned path if it is
4008
if current_path_info[2] in ('directory',
4010
del current_dir_info[1][path_index]
4014
# This dir info has been handled, go to the next
4016
current_dir_info = dir_iterator.next()
4017
except StopIteration:
4018
current_dir_info = None
4020
# We have a dirblock entry for this location, but there
4021
# is no filesystem path for this. This is most likely
4022
# because a directory was removed from the disk.
4023
# We don't have to report the missing directory,
4024
# because that should have already been handled, but we
4025
# need to handle all of the files that are contained
4027
for current_entry in current_block[1]:
4028
# entry referring to file not present on disk.
4029
# advance the entry only, after processing.
4030
result, changed = _process_entry(current_entry, None)
4031
if changed is not None:
4033
self._gather_result_for_consistency(result)
4034
if changed or self.include_unchanged:
4037
if (block_index < len(self.state._dirblocks) and
4038
osutils.is_inside(current_root,
4039
self.state._dirblocks[block_index][0])):
4040
current_block = self.state._dirblocks[block_index]
4042
current_block = None
4045
if current_block and entry_index < len(current_block[1]):
4046
current_entry = current_block[1][entry_index]
4048
current_entry = None
4049
advance_entry = True
4051
if current_dir_info and path_index < len(current_dir_info[1]):
4052
current_path_info = current_dir_info[1][path_index]
4053
if current_path_info[2] == 'directory':
4054
if self.tree._directory_is_tree_reference(
4055
current_path_info[0].decode('utf8')):
4056
current_path_info = current_path_info[:2] + \
4057
('tree-reference',) + current_path_info[3:]
4059
current_path_info = None
4061
path_handled = False
4062
while (current_entry is not None or
4063
current_path_info is not None):
4064
if current_entry is None:
4065
# the check for path_handled when the path is advanced
4066
# will yield this path if needed.
4068
elif current_path_info is None:
4069
# no path is fine: the per entry code will handle it.
4070
result, changed = _process_entry(current_entry, current_path_info)
4071
if changed is not None:
4073
self._gather_result_for_consistency(result)
4074
if changed or self.include_unchanged:
4076
elif (current_entry[0][1] != current_path_info[1]
4077
or current_entry[1][self.target_index][0] in 'ar'):
4078
# The current path on disk doesn't match the dirblock
4079
# record. Either the dirblock is marked as absent, or
4080
# the file on disk is not present at all in the
4081
# dirblock. Either way, report about the dirblock
4082
# entry, and let other code handle the filesystem one.
4084
# Compare the basename for these files to determine
4086
if current_path_info[1] < current_entry[0][1]:
4087
# extra file on disk: pass for now, but only
4088
# increment the path, not the entry
4089
advance_entry = False
4091
# entry referring to file not present on disk.
4092
# advance the entry only, after processing.
4093
result, changed = _process_entry(current_entry, None)
4094
if changed is not None:
4096
self._gather_result_for_consistency(result)
4097
if changed or self.include_unchanged:
4099
advance_path = False
4101
result, changed = _process_entry(current_entry, current_path_info)
4102
if changed is not None:
4105
self._gather_result_for_consistency(result)
4106
if changed or self.include_unchanged:
4108
if advance_entry and current_entry is not None:
4110
if entry_index < len(current_block[1]):
4111
current_entry = current_block[1][entry_index]
4113
current_entry = None
4115
advance_entry = True # reset the advance flaga
4116
if advance_path and current_path_info is not None:
4117
if not path_handled:
4118
# unversioned in all regards
4119
if self.want_unversioned:
4120
new_executable = bool(
4121
stat.S_ISREG(current_path_info[3].st_mode)
4122
and stat.S_IEXEC & current_path_info[3].st_mode)
4124
relpath_unicode = utf8_decode(current_path_info[0])[0]
4125
except UnicodeDecodeError:
4126
raise errors.BadFilenameEncoding(
4127
current_path_info[0], osutils._fs_enc)
4129
(None, relpath_unicode),
4133
(None, utf8_decode(current_path_info[1])[0]),
4134
(None, current_path_info[2]),
4135
(None, new_executable))
4136
# dont descend into this unversioned path if it is
4138
if current_path_info[2] in ('directory'):
4139
del current_dir_info[1][path_index]
4141
# dont descend the disk iterator into any tree
4143
if current_path_info[2] == 'tree-reference':
4144
del current_dir_info[1][path_index]
4147
if path_index < len(current_dir_info[1]):
4148
current_path_info = current_dir_info[1][path_index]
4149
if current_path_info[2] == 'directory':
4150
if self.tree._directory_is_tree_reference(
4151
current_path_info[0].decode('utf8')):
4152
current_path_info = current_path_info[:2] + \
4153
('tree-reference',) + current_path_info[3:]
4155
current_path_info = None
4156
path_handled = False
4158
advance_path = True # reset the advance flagg.
4159
if current_block is not None:
4161
if (block_index < len(self.state._dirblocks) and
4162
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
4163
current_block = self.state._dirblocks[block_index]
4165
current_block = None
4166
if current_dir_info is not None:
4168
current_dir_info = dir_iterator.next()
4169
except StopIteration:
4170
current_dir_info = None
4171
for result in self._iter_specific_file_parents():
4174
def _iter_specific_file_parents(self):
4175
"""Iter over the specific file parents."""
4176
while self.search_specific_file_parents:
4177
# Process the parent directories for the paths we were iterating.
4178
# Even in extremely large trees this should be modest, so currently
4179
# no attempt is made to optimise.
4180
path_utf8 = self.search_specific_file_parents.pop()
4181
if osutils.is_inside_any(self.searched_specific_files, path_utf8):
4182
# We've examined this path.
4184
if path_utf8 in self.searched_exact_paths:
4185
# We've examined this path.
4187
path_entries = self.state._entries_for_path(path_utf8)
4188
# We need either one or two entries. If the path in
4189
# self.target_index has moved (so the entry in source_index is in
4190
# 'ar') then we need to also look for the entry for this path in
4191
# self.source_index, to output the appropriate delete-or-rename.
4192
selected_entries = []
4194
for candidate_entry in path_entries:
4195
# Find entries present in target at this path:
4196
if candidate_entry[1][self.target_index][0] not in 'ar':
4198
selected_entries.append(candidate_entry)
4199
# Find entries present in source at this path:
4200
elif (self.source_index is not None and
4201
candidate_entry[1][self.source_index][0] not in 'ar'):
4203
if candidate_entry[1][self.target_index][0] == 'a':
4204
# Deleted, emit it here.
4205
selected_entries.append(candidate_entry)
4207
# renamed, emit it when we process the directory it
4209
self.search_specific_file_parents.add(
4210
candidate_entry[1][self.target_index][1])
4212
raise AssertionError(
4213
"Missing entry for specific path parent %r, %r" % (
4214
path_utf8, path_entries))
4215
path_info = self._path_info(path_utf8, path_utf8.decode('utf8'))
4216
for entry in selected_entries:
4217
if entry[0][2] in self.seen_ids:
4219
result, changed = self._process_entry(entry, path_info)
4221
raise AssertionError(
4222
"Got entry<->path mismatch for specific path "
4223
"%r entry %r path_info %r " % (
4224
path_utf8, entry, path_info))
4225
# Only include changes - we're outside the users requested
4228
self._gather_result_for_consistency(result)
4229
if (result[6][0] == 'directory' and
4230
result[6][1] != 'directory'):
4231
# This stopped being a directory, the old children have
4233
if entry[1][self.source_index][0] == 'r':
4234
# renamed, take the source path
4235
entry_path_utf8 = entry[1][self.source_index][1]
4237
entry_path_utf8 = path_utf8
4238
initial_key = (entry_path_utf8, '', '')
4239
block_index, _ = self.state._find_block_index_from_key(
4241
if block_index == 0:
4242
# The children of the root are in block index 1.
4244
current_block = None
4245
if block_index < len(self.state._dirblocks):
4246
current_block = self.state._dirblocks[block_index]
4247
if not osutils.is_inside(
4248
entry_path_utf8, current_block[0]):
4249
# No entries for this directory at all.
4250
current_block = None
4251
if current_block is not None:
4252
for entry in current_block[1]:
4253
if entry[1][self.source_index][0] in 'ar':
4254
# Not in the source tree, so doesn't have to be
4257
# Path of the entry itself.
4259
self.search_specific_file_parents.add(
4260
osutils.pathjoin(*entry[0][:2]))
4261
if changed or self.include_unchanged:
4263
self.searched_exact_paths.add(path_utf8)
4265
def _path_info(self, utf8_path, unicode_path):
4266
"""Generate path_info for unicode_path.
4268
:return: None if unicode_path does not exist, or a path_info tuple.
4270
abspath = self.tree.abspath(unicode_path)
2368
dirname_split = dirname.split('/')
2369
cache[dirname] = dirname_split
2372
# Grab the dirname for the current dirblock
2373
cur = dirblocks[mid][0]
4272
stat = os.lstat(abspath)
4274
if e.errno == errno.ENOENT:
4275
# the path does not exist.
4279
utf8_basename = utf8_path.rsplit('/', 1)[-1]
4280
dir_info = (utf8_path, utf8_basename,
4281
osutils.file_kind_from_stat_mode(stat.st_mode), stat,
4283
if dir_info[2] == 'directory':
4284
if self.tree._directory_is_tree_reference(
4286
self.root_dir_info = self.root_dir_info[:2] + \
4287
('tree-reference',) + self.root_dir_info[3:]
4291
# Try to load the compiled form if possible
4293
from bzrlib._dirstate_helpers_pyx import (
4299
ProcessEntryC as _process_entry,
4300
update_entry as update_entry,
4302
except ImportError, e:
4303
osutils.failed_to_load_extension(e)
4304
from bzrlib._dirstate_helpers_py import (
4311
# FIXME: It would be nice to be able to track moved lines so that the
4312
# corresponding python code can be moved to the _dirstate_helpers_py
4313
# module. I don't want to break the history for this important piece of
4314
# code so I left the code here -- vila 20090622
4315
update_entry = py_update_entry
4316
_process_entry = ProcessEntryPython
2375
cur_split = cache[cur]
2377
cur_split = cur.split('/')
2378
cache[cur] = cur_split
2379
if cur_split < dirname_split: lo = mid+1
2385
def pack_stat(st, _encode=base64.encodestring, _pack=struct.pack):
2386
"""Convert stat values into a packed representation."""
2387
# jam 20060614 it isn't really worth removing more entries if we
2388
# are going to leave it in packed form.
2389
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
2390
# With all entries filesize is 5.9M and read time is mabye 280ms
2391
# well within the noise margin
2393
# base64.encode always adds a final newline, so strip it off
2394
return _encode(_pack('>LLLLLL'
2395
, st.st_size, int(st.st_mtime), int(st.st_ctime)
2396
, st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]