20
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
21
are not - this is done for clarity of reading. All string data is in utf8.
25
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
28
WHOLE_NUMBER = {digit}, digit;
30
REVISION_ID = a non-empty utf8 string;
32
dirstate format = header line, full checksum, row count, parent details,
33
ghost_details, entries;
34
header line = "#bazaar dirstate flat format 3", NL;
35
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
36
row count = "num_entries: ", WHOLE_NUMBER, NL;
37
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
38
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
40
entry = entry_key, current_entry_details, {parent_entry_details};
41
entry_key = dirname, basename, fileid;
42
current_entry_details = common_entry_details, working_entry_details;
43
parent_entry_details = common_entry_details, history_entry_details;
44
common_entry_details = MINIKIND, fingerprint, size, executable
45
working_entry_details = packed_stat
46
history_entry_details = REVISION_ID;
49
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
51
Given this definition, the following is useful to know::
53
entry (aka row) - all the data for a given key.
54
entry[0]: The key (dirname, basename, fileid)
58
entry[1]: The tree(s) data for this path and id combination.
59
entry[1][0]: The current tree
60
entry[1][1]: The second tree
62
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate::
64
entry[1][0][0]: minikind
65
entry[1][0][1]: fingerprint
67
entry[1][0][3]: executable
68
entry[1][0][4]: packed_stat
72
entry[1][1][4]: revision_id
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 3", NL;
33
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", WHOLE_NUMBER, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
74
68
There may be multiple rows at the root, one per id present in the root, so the
75
in memory root row is now::
77
self._dirblocks[0] -> ('', [entry ...]),
79
and the entries in there are::
83
entries[0][2]: file_id
84
entries[1][0]: The tree data for the current tree for this fileid at /
89
'r' is a relocated entry: This path is not present in this tree with this
90
id, but the id can be found at another location. The fingerprint is
91
used to point to the target location.
92
'a' is an absent entry: In that tree the id is not present at this path.
93
'd' is a directory entry: This path in this tree is a directory with the
94
current file id. There is no fingerprint for directories.
95
'f' is a file entry: As for directory, but it's a file. The fingerprint is
96
the sha1 value of the file's canonical form, i.e. after any read
97
filters have been applied to the convenience form stored in the working
99
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is
101
't' is a reference to a nested subtree; the fingerprint is the referenced
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but it's a file. The fingerprint is the
86
sha1 value of the file's canonical form, i.e. after any read filters have
87
been applied to the convenience form stored in the working tree.
88
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
90
't' is a reference to a nested subtree; the fingerprint is the referenced
106
The entries on disk and in memory are ordered according to the following keys::
95
The entries on disk and in memory are ordered according to the following keys:
108
97
directory, as a list of components
112
101
--- Format 1 had the following different definition: ---
116
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
117
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
119
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
120
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
102
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
103
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
105
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
106
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
123
109
PARENT ROW's are emitted for every parent that is not in the ghosts details
124
110
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
447
412
self._last_block_index = None
448
413
self._last_entry_index = None
449
# The set of known hash changes
450
self._known_hash_changes = set()
451
# How many hash changed entries can we have without saving
452
self._worth_saving_limit = worth_saving_limit
453
self._config_stack = config.LocationStack(urlutils.local_path_to_url(
456
415
def __repr__(self):
457
416
return "%s(%r)" % \
458
417
(self.__class__.__name__, self._filename)
460
def _mark_modified(self, hash_changed_entries=None, header_modified=False):
461
"""Mark this dirstate as modified.
463
:param hash_changed_entries: if non-None, mark just these entries as
464
having their hash modified.
465
:param header_modified: mark the header modified as well, not just the
468
#trace.mutter_callsite(3, "modified hash entries: %s", hash_changed_entries)
469
if hash_changed_entries:
470
self._known_hash_changes.update([e[0] for e in hash_changed_entries])
471
if self._dirblock_state in (DirState.NOT_IN_MEMORY,
472
DirState.IN_MEMORY_UNMODIFIED):
473
# If the dirstate is already marked a IN_MEMORY_MODIFIED, then
474
# that takes precedence.
475
self._dirblock_state = DirState.IN_MEMORY_HASH_MODIFIED
477
# TODO: Since we now have a IN_MEMORY_HASH_MODIFIED state, we
478
# should fail noisily if someone tries to set
479
# IN_MEMORY_MODIFIED but we don't have a write-lock!
480
# We don't know exactly what changed so disable smart saving
481
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
483
self._header_state = DirState.IN_MEMORY_MODIFIED
485
def _mark_unmodified(self):
486
"""Mark this dirstate as unmodified."""
487
self._header_state = DirState.IN_MEMORY_UNMODIFIED
488
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
489
self._known_hash_changes = set()
491
419
def add(self, path, file_id, kind, stat, fingerprint):
492
420
"""Add a path to be tracked.
1554
1484
if basename_utf8:
1555
1485
parents.add((dirname_utf8, inv_entry.parent_id))
1556
1486
if old_path is None:
1557
old_path_utf8 = None
1559
old_path_utf8 = encode(old_path)
1560
if old_path is None:
1561
adds.append((None, new_path_utf8, file_id,
1487
adds.append((None, encode(new_path), file_id,
1562
1488
inv_to_entry(inv_entry), True))
1563
1489
new_ids.add(file_id)
1564
1490
elif new_path is None:
1565
deletes.append((old_path_utf8, None, file_id, None, True))
1566
elif (old_path, new_path) == root_only:
1567
# change things in-place
1568
# Note: the case of a parent directory changing its file_id
1569
# tends to break optimizations here, because officially
1570
# the file has actually been moved, it just happens to
1571
# end up at the same path. If we can figure out how to
1572
# handle that case, we can avoid a lot of add+delete
1573
# pairs for objects that stay put.
1574
# elif old_path == new_path:
1575
changes.append((old_path_utf8, new_path_utf8, file_id,
1576
inv_to_entry(inv_entry)))
1491
deletes.append((encode(old_path), None, file_id, None, True))
1492
elif (old_path, new_path) != root_only:
1579
1494
# Because renames must preserve their children we must have
1580
1495
# processed all relocations and removes before hand. The sort
1590
1505
self._update_basis_apply_deletes(deletes)
1592
1507
# Split into an add/delete pair recursively.
1593
adds.append((old_path_utf8, new_path_utf8, file_id,
1594
inv_to_entry(inv_entry), False))
1508
adds.append((None, new_path_utf8, file_id,
1509
inv_to_entry(inv_entry), False))
1595
1510
# Expunge deletes that we've seen so that deleted/renamed
1596
1511
# children of a rename directory are handled correctly.
1597
new_deletes = reversed(list(
1598
self._iter_child_entries(1, old_path_utf8)))
1512
new_deletes = reversed(list(self._iter_child_entries(1,
1599
1514
# Remove the current contents of the tree at orig_path, and
1600
1515
# reinsert at the correct new path.
1601
1516
for entry in new_deletes:
1602
child_dirname, child_basename, child_file_id = entry[0]
1604
source_path = child_dirname + '/' + child_basename
1518
source_path = entry[0][0] + '/' + entry[0][1]
1606
source_path = child_basename
1520
source_path = entry[0][1]
1607
1521
if new_path_utf8:
1608
1522
target_path = new_path_utf8 + source_path[len(old_path):]
1610
1524
if old_path == '':
1611
1525
raise AssertionError("cannot rename directory to"
1613
1527
target_path = source_path[len(old_path) + 1:]
1614
1528
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1615
1529
deletes.append(
1616
1530
(source_path, target_path, entry[0][2], None, False))
1617
deletes.append((old_path_utf8, new_path, file_id, None, False))
1532
(encode(old_path), new_path, file_id, None, False))
1534
# changes to just the root should not require remove/insertion
1536
changes.append((encode(old_path), encode(new_path), file_id,
1537
inv_to_entry(inv_entry)))
1618
1538
self._check_delta_ids_absent(new_ids, delta, 1)
1620
1540
# Finish expunging deletes/first half of renames.
1677
1598
# Adds are accumulated partly from renames, so can be in any input
1678
1599
# order - sort it.
1679
# TODO: we may want to sort in dirblocks order. That way each entry
1680
# will end up in the same directory, allowing the _get_entry
1681
# fast-path for looking up 2 items in the same dir work.
1682
adds.sort(key=lambda x: x[1])
1683
1601
# adds is now in lexographic order, which places all parents before
1684
1602
# their children, so we can process it linearly.
1686
st = static_tuple.StaticTuple
1687
1604
for old_path, new_path, file_id, new_details, real_add in adds:
1688
dirname, basename = osutils.split(new_path)
1689
entry_key = st(dirname, basename, file_id)
1690
block_index, present = self._find_block_index_from_key(entry_key)
1692
self._raise_invalid(new_path, file_id,
1693
"Unable to find block for this record."
1694
" Was the parent added?")
1695
block = self._dirblocks[block_index][1]
1696
entry_index, present = self._find_entry_index(entry_key, block)
1698
if old_path is not None:
1699
self._raise_invalid(new_path, file_id,
1700
'considered a real add but still had old_path at %s'
1703
entry = block[entry_index]
1704
basis_kind = entry[1][1][0]
1705
if basis_kind == 'a':
1706
entry[1][1] = new_details
1707
elif basis_kind == 'r':
1708
raise NotImplementedError()
1710
self._raise_invalid(new_path, file_id,
1711
"An entry was marked as a new add"
1712
" but the basis target already existed")
1714
# The exact key was not found in the block. However, we need to
1715
# check if there is a key next to us that would have matched.
1716
# We only need to check 2 locations, because there are only 2
1718
for maybe_index in range(entry_index-1, entry_index+1):
1719
if maybe_index < 0 or maybe_index >= len(block):
1721
maybe_entry = block[maybe_index]
1722
if maybe_entry[0][:2] != (dirname, basename):
1723
# Just a random neighbor
1725
if maybe_entry[0][2] == file_id:
1726
raise AssertionError(
1727
'_find_entry_index didnt find a key match'
1728
' but walking the data did, for %s'
1730
basis_kind = maybe_entry[1][1][0]
1731
if basis_kind not in 'ar':
1732
self._raise_invalid(new_path, file_id,
1733
"we have an add record for path, but the path"
1734
" is already present with another file_id %s"
1735
% (maybe_entry[0][2],))
1737
entry = (entry_key, [DirState.NULL_PARENT_DETAILS,
1739
block.insert(entry_index, entry)
1741
active_kind = entry[1][0][0]
1742
if active_kind == 'a':
1743
# The active record shows up as absent, this could be genuine,
1744
# or it could be present at some other location. We need to
1746
id_index = self._get_id_index()
1747
# The id_index may not be perfectly accurate for tree1, because
1748
# we haven't been keeping it updated. However, it should be
1749
# fine for tree0, and that gives us enough info for what we
1751
keys = id_index.get(file_id, ())
1753
block_i, entry_i, d_present, f_present = \
1754
self._get_block_entry_index(key[0], key[1], 0)
1757
active_entry = self._dirblocks[block_i][1][entry_i]
1758
if (active_entry[0][2] != file_id):
1759
# Some other file is at this path, we don't need to
1762
real_active_kind = active_entry[1][0][0]
1763
if real_active_kind in 'ar':
1764
# We found a record, which was not *this* record,
1765
# which matches the file_id, but is not actually
1766
# present. Something seems *really* wrong.
1767
self._raise_invalid(new_path, file_id,
1768
"We found a tree0 entry that doesnt make sense")
1769
# Now, we've found a tree0 entry which matches the file_id
1770
# but is at a different location. So update them to be
1772
active_dir, active_name = active_entry[0][:2]
1774
active_path = active_dir + '/' + active_name
1776
active_path = active_name
1777
active_entry[1][1] = st('r', new_path, 0, False, '')
1778
entry[1][0] = st('r', active_path, 0, False, '')
1779
elif active_kind == 'r':
1780
raise NotImplementedError()
1782
new_kind = new_details[0]
1784
self._ensure_block(block_index, entry_index, new_path)
1605
# the entry for this file_id must be in tree 0.
1606
entry = self._get_entry(0, file_id, new_path)
1607
if entry[0] is None or entry[0][2] != file_id:
1608
self._changes_aborted = True
1609
raise errors.InconsistentDelta(new_path, file_id,
1610
'working tree does not contain new entry')
1611
if real_add and entry[1][1][0] not in absent:
1612
self._changes_aborted = True
1613
raise errors.InconsistentDelta(new_path, file_id,
1614
'The entry was considered to be a genuinely new record,'
1615
' but there was already an old record for it.')
1616
# We don't need to update the target of an 'r' because the handling
1617
# of renames turns all 'r' situations into a delete at the original
1619
entry[1][1] = new_details
1786
1621
def _update_basis_apply_changes(self, changes):
1787
1622
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1813
1654
null = DirState.NULL_PARENT_DETAILS
1814
1655
for old_path, new_path, file_id, _, real_delete in deletes:
1815
1656
if real_delete != (new_path is None):
1816
self._raise_invalid(old_path, file_id, "bad delete delta")
1657
self._changes_aborted = True
1658
raise AssertionError("bad delete delta")
1817
1659
# the entry for this file_id must be in tree 1.
1818
1660
dirname, basename = osutils.split(old_path)
1819
1661
block_index, entry_index, dir_present, file_present = \
1820
1662
self._get_block_entry_index(dirname, basename, 1)
1821
1663
if not file_present:
1822
self._raise_invalid(old_path, file_id,
1664
self._changes_aborted = True
1665
raise errors.InconsistentDelta(old_path, file_id,
1823
1666
'basis tree does not contain removed entry')
1824
1667
entry = self._dirblocks[block_index][1][entry_index]
1825
# The state of the entry in the 'active' WT
1826
active_kind = entry[1][0][0]
1827
1668
if entry[0][2] != file_id:
1828
self._raise_invalid(old_path, file_id,
1669
self._changes_aborted = True
1670
raise errors.InconsistentDelta(old_path, file_id,
1829
1671
'mismatched file_id in tree 1')
1831
old_kind = entry[1][1][0]
1832
if active_kind in 'ar':
1833
# The active tree doesn't have this file_id.
1834
# The basis tree is changing this record. If this is a
1835
# rename, then we don't want the record here at all
1836
# anymore. If it is just an in-place change, we want the
1837
# record here, but we'll add it if we need to. So we just
1839
if active_kind == 'r':
1840
active_path = entry[1][0][1]
1841
active_entry = self._get_entry(0, file_id, active_path)
1842
if active_entry[1][1][0] != 'r':
1843
self._raise_invalid(old_path, file_id,
1844
"Dirstate did not have matching rename entries")
1845
elif active_entry[1][0][0] in 'ar':
1846
self._raise_invalid(old_path, file_id,
1847
"Dirstate had a rename pointing at an inactive"
1849
active_entry[1][1] = null
1673
if entry[1][0][0] != 'a':
1674
self._changes_aborted = True
1675
raise errors.InconsistentDelta(old_path, file_id,
1676
'This was marked as a real delete, but the WT state'
1677
' claims that it still exists and is versioned.')
1850
1678
del self._dirblocks[block_index][1][entry_index]
1852
# This was a directory, and the active tree says it
1853
# doesn't exist, and now the basis tree says it doesn't
1854
# exist. Remove its dirblock if present
1856
present) = self._find_block_index_from_key(
1859
dir_block = self._dirblocks[dir_block_index][1]
1861
# This entry is empty, go ahead and just remove it
1862
del self._dirblocks[dir_block_index]
1864
# There is still an active record, so just mark this
1867
block_i, entry_i, d_present, f_present = \
1868
self._get_block_entry_index(old_path, '', 1)
1870
dir_block = self._dirblocks[block_i][1]
1871
for child_entry in dir_block:
1872
child_basis_kind = child_entry[1][1][0]
1873
if child_basis_kind not in 'ar':
1874
self._raise_invalid(old_path, file_id,
1875
"The file id was deleted but its children were "
1680
if entry[1][0][0] == 'a':
1681
self._changes_aborted = True
1682
raise errors.InconsistentDelta(old_path, file_id,
1683
'The entry was considered a rename, but the source path'
1684
' is marked as absent.')
1685
# For whatever reason, we were asked to rename an entry
1686
# that was originally marked as deleted. This could be
1687
# because we are renaming the parent directory, and the WT
1688
# current state has the file marked as deleted.
1689
elif entry[1][0][0] == 'r':
1690
# implement the rename
1691
del self._dirblocks[block_index][1][entry_index]
1693
# it is being resurrected here, so blank it out temporarily.
1694
self._dirblocks[block_index][1][entry_index][1][1] = null
1878
1696
def _after_delta_check_parents(self, parents, index):
1879
1697
"""Check that parents required by the delta are all intact.
2508
2320
trace.mutter('Not saving DirState because '
2509
2321
'_changes_aborted is set.')
2511
# TODO: Since we now distinguish IN_MEMORY_MODIFIED from
2512
# IN_MEMORY_HASH_MODIFIED, we should only fail quietly if we fail
2513
# to save an IN_MEMORY_HASH_MODIFIED, and fail *noisily* if we
2514
# fail to save IN_MEMORY_MODIFIED
2515
if not self._worth_saving():
2323
if (self._header_state == DirState.IN_MEMORY_MODIFIED or
2324
self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2518
grabbed_write_lock = False
2519
if self._lock_state != 'w':
2520
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2521
# Switch over to the new lock, as the old one may be closed.
2522
# TODO: jam 20070315 We should validate the disk file has
2523
# not changed contents, since temporary_write_lock may
2524
# not be an atomic operation.
2525
self._lock_token = new_lock
2526
self._state_file = new_lock.f
2527
if not grabbed_write_lock:
2528
# We couldn't grab a write lock, so we switch back to a read one
2531
lines = self.get_lines()
2532
self._state_file.seek(0)
2533
self._state_file.writelines(lines)
2534
self._state_file.truncate()
2535
self._state_file.flush()
2536
self._maybe_fdatasync()
2537
self._mark_unmodified()
2539
if grabbed_write_lock:
2540
self._lock_token = self._lock_token.restore_read_lock()
2541
self._state_file = self._lock_token.f
2326
grabbed_write_lock = False
2327
if self._lock_state != 'w':
2328
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2329
# Switch over to the new lock, as the old one may be closed.
2542
2330
# TODO: jam 20070315 We should validate the disk file has
2543
# not changed contents. Since restore_read_lock may
2544
# not be an atomic operation.
2546
def _maybe_fdatasync(self):
2547
"""Flush to disk if possible and if not configured off."""
2548
if self._config_stack.get('dirstate.fdatasync'):
2549
osutils.fdatasync(self._state_file.fileno())
2551
def _worth_saving(self):
2552
"""Is it worth saving the dirstate or not?"""
2553
if (self._header_state == DirState.IN_MEMORY_MODIFIED
2554
or self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2556
if self._dirblock_state == DirState.IN_MEMORY_HASH_MODIFIED:
2557
if self._worth_saving_limit == -1:
2558
# We never save hash changes when the limit is -1
2560
# If we're using smart saving and only a small number of
2561
# entries have changed their hash, don't bother saving. John has
2562
# suggested using a heuristic here based on the size of the
2563
# changed files and/or tree. For now, we go with a configurable
2564
# number of changes, keeping the calculation time
2565
# as low overhead as possible. (This also keeps all existing
2566
# tests passing as the default is 0, i.e. always save.)
2567
if len(self._known_hash_changes) >= self._worth_saving_limit:
2331
# not changed contents. Since temporary_write_lock may
2332
# not be an atomic operation.
2333
self._lock_token = new_lock
2334
self._state_file = new_lock.f
2335
if not grabbed_write_lock:
2336
# We couldn't grab a write lock, so we switch back to a read one
2339
self._state_file.seek(0)
2340
self._state_file.writelines(self.get_lines())
2341
self._state_file.truncate()
2342
self._state_file.flush()
2343
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2344
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
2346
if grabbed_write_lock:
2347
self._lock_token = self._lock_token.restore_read_lock()
2348
self._state_file = self._lock_token.f
2349
# TODO: jam 20070315 We should validate the disk file has
2350
# not changed contents. Since restore_read_lock may
2351
# not be an atomic operation.
2571
2353
def _set_data(self, parent_ids, dirblocks):
2572
2354
"""Set the full dirstate data in memory.