20
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
21
are not - this is done for clarity of reading. All string data is in utf8.
25
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
28
WHOLE_NUMBER = {digit}, digit;
30
REVISION_ID = a non-empty utf8 string;
32
dirstate format = header line, full checksum, row count, parent details,
33
ghost_details, entries;
34
header line = "#bazaar dirstate flat format 3", NL;
35
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
36
row count = "num_entries: ", WHOLE_NUMBER, NL;
37
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
38
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
40
entry = entry_key, current_entry_details, {parent_entry_details};
41
entry_key = dirname, basename, fileid;
42
current_entry_details = common_entry_details, working_entry_details;
43
parent_entry_details = common_entry_details, history_entry_details;
44
common_entry_details = MINIKIND, fingerprint, size, executable
45
working_entry_details = packed_stat
46
history_entry_details = REVISION_ID;
49
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
51
Given this definition, the following is useful to know::
53
entry (aka row) - all the data for a given key.
54
entry[0]: The key (dirname, basename, fileid)
58
entry[1]: The tree(s) data for this path and id combination.
59
entry[1][0]: The current tree
60
entry[1][1]: The second tree
62
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate::
64
entry[1][0][0]: minikind
65
entry[1][0][1]: fingerprint
67
entry[1][0][3]: executable
68
entry[1][0][4]: packed_stat
72
entry[1][1][4]: revision_id
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 3", NL;
33
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", WHOLE_NUMBER, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
74
68
There may be multiple rows at the root, one per id present in the root, so the
75
in memory root row is now::
77
self._dirblocks[0] -> ('', [entry ...]),
79
and the entries in there are::
83
entries[0][2]: file_id
84
entries[1][0]: The tree data for the current tree for this fileid at /
89
'r' is a relocated entry: This path is not present in this tree with this
90
id, but the id can be found at another location. The fingerprint is
91
used to point to the target location.
92
'a' is an absent entry: In that tree the id is not present at this path.
93
'd' is a directory entry: This path in this tree is a directory with the
94
current file id. There is no fingerprint for directories.
95
'f' is a file entry: As for directory, but it's a file. The fingerprint is
96
the sha1 value of the file's canonical form, i.e. after any read
97
filters have been applied to the convenience form stored in the working
99
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is
101
't' is a reference to a nested subtree; the fingerprint is the referenced
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but it's a file. The fingerprint is the
86
sha1 value of the file's canonical form, i.e. after any read filters have
87
been applied to the convenience form stored in the working tree.
88
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
90
't' is a reference to a nested subtree; the fingerprint is the referenced
106
The entries on disk and in memory are ordered according to the following keys::
95
The entries on disk and in memory are ordered according to the following keys:
108
97
directory, as a list of components
112
101
--- Format 1 had the following different definition: ---
116
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
117
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
119
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
120
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
102
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
103
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
105
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
106
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
123
109
PARENT ROW's are emitted for every parent that is not in the ghosts details
124
110
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
251
232
ERROR_DIRECTORY = 267
235
if not getattr(struct, '_compile', None):
236
# Cannot pre-compile the dirstate pack_stat
237
def pack_stat(st, _encode=binascii.b2a_base64, _pack=struct.pack):
238
"""Convert stat values into a packed representation."""
239
return _encode(_pack('>LLLLLL', st.st_size, int(st.st_mtime),
240
int(st.st_ctime), st.st_dev, st.st_ino & 0xFFFFFFFF,
243
# compile the struct compiler we need, so as to only do it once
244
from _struct import Struct
245
_compiled_pack = Struct('>LLLLLL').pack
246
def pack_stat(st, _encode=binascii.b2a_base64, _pack=_compiled_pack):
247
"""Convert stat values into a packed representation."""
248
# jam 20060614 it isn't really worth removing more entries if we
249
# are going to leave it in packed form.
250
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
251
# With all entries, filesize is 5.9M and read time is maybe 280ms
252
# well within the noise margin
254
# base64 encoding always adds a final newline, so strip it off
255
# The current version
256
return _encode(_pack(st.st_size, int(st.st_mtime), int(st.st_ctime),
257
st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]
258
# This is 0.060s / 1.520s faster by not encoding as much information
259
# return _encode(_pack('>LL', int(st.st_mtime), st.st_mode))[:-1]
260
# This is not strictly faster than _encode(_pack())[:-1]
261
# return '%X.%X.%X.%X.%X.%X' % (
262
# st.st_size, int(st.st_mtime), int(st.st_ctime),
263
# st.st_dev, st.st_ino, st.st_mode)
264
# Similar to the _encode(_pack('>LL'))
265
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
268
def _unpack_stat(packed_stat):
269
"""Turn a packed_stat back into the stat fields.
271
This is meant as a debugging tool, should not be used in real code.
273
(st_size, st_mtime, st_ctime, st_dev, st_ino,
274
st_mode) = struct.unpack('>LLLLLL', binascii.a2b_base64(packed_stat))
275
return dict(st_size=st_size, st_mtime=st_mtime, st_ctime=st_ctime,
276
st_dev=st_dev, st_ino=st_ino, st_mode=st_mode)
254
279
class SHA1Provider(object):
255
280
"""An interface for getting sha1s of a file."""
1510
1534
if basename_utf8:
1511
1535
parents.add((dirname_utf8, inv_entry.parent_id))
1512
1536
if old_path is None:
1513
old_path_utf8 = None
1515
old_path_utf8 = encode(old_path)
1516
if old_path is None:
1517
adds.append((None, new_path_utf8, file_id,
1537
adds.append((None, encode(new_path), file_id,
1518
1538
inv_to_entry(inv_entry), True))
1519
1539
new_ids.add(file_id)
1520
1540
elif new_path is None:
1521
deletes.append((old_path_utf8, None, file_id, None, True))
1522
elif (old_path, new_path) == root_only:
1523
# change things in-place
1524
# Note: the case of a parent directory changing its file_id
1525
# tends to break optimizations here, because officially
1526
# the file has actually been moved, it just happens to
1527
# end up at the same path. If we can figure out how to
1528
# handle that case, we can avoid a lot of add+delete
1529
# pairs for objects that stay put.
1530
# elif old_path == new_path:
1531
changes.append((old_path_utf8, new_path_utf8, file_id,
1532
inv_to_entry(inv_entry)))
1541
deletes.append((encode(old_path), None, file_id, None, True))
1542
elif (old_path, new_path) != root_only:
1535
1544
# Because renames must preserve their children we must have
1536
1545
# processed all relocations and removes before hand. The sort
1546
1555
self._update_basis_apply_deletes(deletes)
1548
1557
# Split into an add/delete pair recursively.
1549
adds.append((old_path_utf8, new_path_utf8, file_id,
1550
inv_to_entry(inv_entry), False))
1558
adds.append((None, new_path_utf8, file_id,
1559
inv_to_entry(inv_entry), False))
1551
1560
# Expunge deletes that we've seen so that deleted/renamed
1552
1561
# children of a rename directory are handled correctly.
1553
new_deletes = reversed(list(
1554
self._iter_child_entries(1, old_path_utf8)))
1562
new_deletes = reversed(list(self._iter_child_entries(1,
1555
1564
# Remove the current contents of the tree at orig_path, and
1556
1565
# reinsert at the correct new path.
1557
1566
for entry in new_deletes:
1558
child_dirname, child_basename, child_file_id = entry[0]
1560
source_path = child_dirname + '/' + child_basename
1568
source_path = entry[0][0] + '/' + entry[0][1]
1562
source_path = child_basename
1570
source_path = entry[0][1]
1563
1571
if new_path_utf8:
1565
new_path_utf8 + source_path[len(old_path_utf8):]
1572
target_path = new_path_utf8 + source_path[len(old_path):]
1567
if old_path_utf8 == '':
1568
1575
raise AssertionError("cannot rename directory to"
1570
target_path = source_path[len(old_path_utf8) + 1:]
1577
target_path = source_path[len(old_path) + 1:]
1571
1578
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1572
1579
deletes.append(
1573
1580
(source_path, target_path, entry[0][2], None, False))
1574
1581
deletes.append(
1575
(old_path_utf8, new_path_utf8, file_id, None, False))
1582
(encode(old_path), new_path, file_id, None, False))
1584
# changes to just the root should not require remove/insertion
1586
changes.append((encode(old_path), encode(new_path), file_id,
1587
inv_to_entry(inv_entry)))
1577
1588
self._check_delta_ids_absent(new_ids, delta, 1)
1579
1590
# Finish expunging deletes/first half of renames.
1636
1647
# Adds are accumulated partly from renames, so can be in any input
1637
1648
# order - sort it.
1638
# TODO: we may want to sort in dirblocks order. That way each entry
1639
# will end up in the same directory, allowing the _get_entry
1640
# fast-path for looking up 2 items in the same dir work.
1641
adds.sort(key=lambda x: x[1])
1642
1650
# adds is now in lexographic order, which places all parents before
1643
1651
# their children, so we can process it linearly.
1645
st = static_tuple.StaticTuple
1646
1653
for old_path, new_path, file_id, new_details, real_add in adds:
1647
dirname, basename = osutils.split(new_path)
1648
entry_key = st(dirname, basename, file_id)
1649
block_index, present = self._find_block_index_from_key(entry_key)
1651
self._raise_invalid(new_path, file_id,
1652
"Unable to find block for this record."
1653
" Was the parent added?")
1654
block = self._dirblocks[block_index][1]
1655
entry_index, present = self._find_entry_index(entry_key, block)
1657
if old_path is not None:
1658
self._raise_invalid(new_path, file_id,
1659
'considered a real add but still had old_path at %s'
1662
entry = block[entry_index]
1663
basis_kind = entry[1][1][0]
1664
if basis_kind == 'a':
1665
entry[1][1] = new_details
1666
elif basis_kind == 'r':
1667
raise NotImplementedError()
1669
self._raise_invalid(new_path, file_id,
1670
"An entry was marked as a new add"
1671
" but the basis target already existed")
1673
# The exact key was not found in the block. However, we need to
1674
# check if there is a key next to us that would have matched.
1675
# We only need to check 2 locations, because there are only 2
1677
for maybe_index in range(entry_index-1, entry_index+1):
1678
if maybe_index < 0 or maybe_index >= len(block):
1680
maybe_entry = block[maybe_index]
1681
if maybe_entry[0][:2] != (dirname, basename):
1682
# Just a random neighbor
1684
if maybe_entry[0][2] == file_id:
1685
raise AssertionError(
1686
'_find_entry_index didnt find a key match'
1687
' but walking the data did, for %s'
1689
basis_kind = maybe_entry[1][1][0]
1690
if basis_kind not in 'ar':
1691
self._raise_invalid(new_path, file_id,
1692
"we have an add record for path, but the path"
1693
" is already present with another file_id %s"
1694
% (maybe_entry[0][2],))
1696
entry = (entry_key, [DirState.NULL_PARENT_DETAILS,
1698
block.insert(entry_index, entry)
1700
active_kind = entry[1][0][0]
1701
if active_kind == 'a':
1702
# The active record shows up as absent, this could be genuine,
1703
# or it could be present at some other location. We need to
1705
id_index = self._get_id_index()
1706
# The id_index may not be perfectly accurate for tree1, because
1707
# we haven't been keeping it updated. However, it should be
1708
# fine for tree0, and that gives us enough info for what we
1710
keys = id_index.get(file_id, ())
1712
block_i, entry_i, d_present, f_present = \
1713
self._get_block_entry_index(key[0], key[1], 0)
1716
active_entry = self._dirblocks[block_i][1][entry_i]
1717
if (active_entry[0][2] != file_id):
1718
# Some other file is at this path, we don't need to
1721
real_active_kind = active_entry[1][0][0]
1722
if real_active_kind in 'ar':
1723
# We found a record, which was not *this* record,
1724
# which matches the file_id, but is not actually
1725
# present. Something seems *really* wrong.
1726
self._raise_invalid(new_path, file_id,
1727
"We found a tree0 entry that doesnt make sense")
1728
# Now, we've found a tree0 entry which matches the file_id
1729
# but is at a different location. So update them to be
1731
active_dir, active_name = active_entry[0][:2]
1733
active_path = active_dir + '/' + active_name
1735
active_path = active_name
1736
active_entry[1][1] = st('r', new_path, 0, False, '')
1737
entry[1][0] = st('r', active_path, 0, False, '')
1738
elif active_kind == 'r':
1739
raise NotImplementedError()
1741
new_kind = new_details[0]
1743
self._ensure_block(block_index, entry_index, new_path)
1654
# the entry for this file_id must be in tree 0.
1655
entry = self._get_entry(0, file_id, new_path)
1656
if entry[0] is None or entry[0][2] != file_id:
1657
self._changes_aborted = True
1658
raise errors.InconsistentDelta(new_path, file_id,
1659
'working tree does not contain new entry')
1660
if real_add and entry[1][1][0] not in absent:
1661
self._changes_aborted = True
1662
raise errors.InconsistentDelta(new_path, file_id,
1663
'The entry was considered to be a genuinely new record,'
1664
' but there was already an old record for it.')
1665
# We don't need to update the target of an 'r' because the handling
1666
# of renames turns all 'r' situations into a delete at the original
1668
entry[1][1] = new_details
1745
1670
def _update_basis_apply_changes(self, changes):
1746
1671
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1772
1703
null = DirState.NULL_PARENT_DETAILS
1773
1704
for old_path, new_path, file_id, _, real_delete in deletes:
1774
1705
if real_delete != (new_path is None):
1775
self._raise_invalid(old_path, file_id, "bad delete delta")
1706
self._changes_aborted = True
1707
raise AssertionError("bad delete delta")
1776
1708
# the entry for this file_id must be in tree 1.
1777
1709
dirname, basename = osutils.split(old_path)
1778
1710
block_index, entry_index, dir_present, file_present = \
1779
1711
self._get_block_entry_index(dirname, basename, 1)
1780
1712
if not file_present:
1781
self._raise_invalid(old_path, file_id,
1713
self._changes_aborted = True
1714
raise errors.InconsistentDelta(old_path, file_id,
1782
1715
'basis tree does not contain removed entry')
1783
1716
entry = self._dirblocks[block_index][1][entry_index]
1784
# The state of the entry in the 'active' WT
1785
active_kind = entry[1][0][0]
1786
1717
if entry[0][2] != file_id:
1787
self._raise_invalid(old_path, file_id,
1718
self._changes_aborted = True
1719
raise errors.InconsistentDelta(old_path, file_id,
1788
1720
'mismatched file_id in tree 1')
1790
old_kind = entry[1][1][0]
1791
if active_kind in 'ar':
1792
# The active tree doesn't have this file_id.
1793
# The basis tree is changing this record. If this is a
1794
# rename, then we don't want the record here at all
1795
# anymore. If it is just an in-place change, we want the
1796
# record here, but we'll add it if we need to. So we just
1798
if active_kind == 'r':
1799
active_path = entry[1][0][1]
1800
active_entry = self._get_entry(0, file_id, active_path)
1801
if active_entry[1][1][0] != 'r':
1802
self._raise_invalid(old_path, file_id,
1803
"Dirstate did not have matching rename entries")
1804
elif active_entry[1][0][0] in 'ar':
1805
self._raise_invalid(old_path, file_id,
1806
"Dirstate had a rename pointing at an inactive"
1808
active_entry[1][1] = null
1722
if entry[1][0][0] != 'a':
1723
self._changes_aborted = True
1724
raise errors.InconsistentDelta(old_path, file_id,
1725
'This was marked as a real delete, but the WT state'
1726
' claims that it still exists and is versioned.')
1809
1727
del self._dirblocks[block_index][1][entry_index]
1811
# This was a directory, and the active tree says it
1812
# doesn't exist, and now the basis tree says it doesn't
1813
# exist. Remove its dirblock if present
1815
present) = self._find_block_index_from_key(
1818
dir_block = self._dirblocks[dir_block_index][1]
1820
# This entry is empty, go ahead and just remove it
1821
del self._dirblocks[dir_block_index]
1823
# There is still an active record, so just mark this
1826
block_i, entry_i, d_present, f_present = \
1827
self._get_block_entry_index(old_path, '', 1)
1829
dir_block = self._dirblocks[block_i][1]
1830
for child_entry in dir_block:
1831
child_basis_kind = child_entry[1][1][0]
1832
if child_basis_kind not in 'ar':
1833
self._raise_invalid(old_path, file_id,
1834
"The file id was deleted but its children were "
1729
if entry[1][0][0] == 'a':
1730
self._changes_aborted = True
1731
raise errors.InconsistentDelta(old_path, file_id,
1732
'The entry was considered a rename, but the source path'
1733
' is marked as absent.')
1734
# For whatever reason, we were asked to rename an entry
1735
# that was originally marked as deleted. This could be
1736
# because we are renaming the parent directory, and the WT
1737
# current state has the file marked as deleted.
1738
elif entry[1][0][0] == 'r':
1739
# implement the rename
1740
del self._dirblocks[block_index][1][entry_index]
1742
# it is being resurrected here, so blank it out temporarily.
1743
self._dirblocks[block_index][1][entry_index][1][1] = null
1837
1745
def _after_delta_check_parents(self, parents, index):
1838
1746
"""Check that parents required by the delta are all intact.
2470
2380
# IN_MEMORY_HASH_MODIFIED, we should only fail quietly if we fail
2471
2381
# to save an IN_MEMORY_HASH_MODIFIED, and fail *noisily* if we
2472
2382
# fail to save IN_MEMORY_MODIFIED
2473
if not self._worth_saving():
2476
grabbed_write_lock = False
2477
if self._lock_state != 'w':
2478
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2479
# Switch over to the new lock, as the old one may be closed.
2480
# TODO: jam 20070315 We should validate the disk file has
2481
# not changed contents, since temporary_write_lock may
2482
# not be an atomic operation.
2483
self._lock_token = new_lock
2484
self._state_file = new_lock.f
2485
if not grabbed_write_lock:
2486
# We couldn't grab a write lock, so we switch back to a read one
2489
lines = self.get_lines()
2490
self._state_file.seek(0)
2491
self._state_file.writelines(lines)
2492
self._state_file.truncate()
2493
self._state_file.flush()
2494
self._maybe_fdatasync()
2495
self._mark_unmodified()
2497
if grabbed_write_lock:
2498
self._lock_token = self._lock_token.restore_read_lock()
2499
self._state_file = self._lock_token.f
2383
if self._worth_saving():
2384
grabbed_write_lock = False
2385
if self._lock_state != 'w':
2386
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2387
# Switch over to the new lock, as the old one may be closed.
2500
2388
# TODO: jam 20070315 We should validate the disk file has
2501
# not changed contents. Since restore_read_lock may
2502
# not be an atomic operation.
2504
def _maybe_fdatasync(self):
2505
"""Flush to disk if possible and if not configured off."""
2506
if self._config_stack.get('dirstate.fdatasync'):
2507
osutils.fdatasync(self._state_file.fileno())
2389
# not changed contents. Since temporary_write_lock may
2390
# not be an atomic operation.
2391
self._lock_token = new_lock
2392
self._state_file = new_lock.f
2393
if not grabbed_write_lock:
2394
# We couldn't grab a write lock, so we switch back to a read one
2397
lines = self.get_lines()
2398
self._state_file.seek(0)
2399
self._state_file.writelines(lines)
2400
self._state_file.truncate()
2401
self._state_file.flush()
2402
self._mark_unmodified()
2404
if grabbed_write_lock:
2405
self._lock_token = self._lock_token.restore_read_lock()
2406
self._state_file = self._lock_token.f
2407
# TODO: jam 20070315 We should validate the disk file has
2408
# not changed contents. Since restore_read_lock may
2409
# not be an atomic operation.
2509
2411
def _worth_saving(self):
2510
2412
"""Is it worth saving the dirstate or not?"""