439
439
formerly linked. (This will normally bubble down due to keeping densely
441
441
To shrink the prefix of a leaf node, create an internal node with the same
442
prefix, then choose a width for the internal node such that the contents
442
prefix, then choose a width for the internal node such that the contents
443
443
of the leaf all fit into new leaves obeying the min_size and max_size rules.
444
444
The largest prefix possible should be chosen, to obey the
445
higher-nodes-are-denser rule. That rule also gives room in leaf nodes for
445
higher-nodes-are-denser rule. That rule also gives room in leaf nodes for
446
446
growth without affecting the parent node packing.
447
447
#. Update the CHK pointers - serialise every altered node to generate a CHK,
448
448
and update the CHK placeholder in the nodes parent; then reserialise the
449
parent. CHK pointer propagation can be done lazily when many updates are
449
parent. CHK pointer propogation can be done lazily when many updates are
452
452
Multiple versions of nodes for the same PREFIX and internal prefix width should
453
453
compress well for the same tree.
459
An inventory is a serialization of the in-memory inventory delta. To serialize
460
an inventory delta, one takes an existing inventory delta and the revision_id
461
of the revision it was created it against and the revision id of the inventory
462
which should result by applying the delta to the parent. We then serialize
463
every item in the delta in a simple format:
465
'format: bzr inventory delta v1 (1.14)' NL
466
'parent:' SP BASIS_INVENTORY NL
467
'version:' SP NULL_OR_REVISION NL
468
'versioned_root:' SP BOOL NL
469
'tree_references:' SP BOOL NL
472
DELTA_LINES ::= (DELTA_LINE NL)*
473
DELTA_LINE ::= OLDPATH NULL NEWPATH NULL file-id NULL PARENT_ID NULL LAST_MODIFIED NULL CONTENT
475
BOOL ::= 'true' | 'false'
477
OLDPATH ::= NONE | PATH
478
NEWPATH ::= NONE | PATH
481
PARENT_ID ::= FILE_ID | ''
482
CONTENT ::= DELETED_CONTENT | FILE_CONTENT | DIR_CONTENT | TREE_CONTENT | LINK_CONTENT
483
DELETED_CONTENT ::= 'deleted'
484
FILE_CONTENT ::= 'file' NULL text_size NULL EXEC NULL text_sha1
485
DIR_CONTENT ::= 'dir'
486
TREE_CONTENT ::= 'tree' NULL tree-revision
487
LINK_CONTENT ::= 'link' NULL link-target
488
BASIS_INVENTORY ::= NULL_OR_REVISION
489
LAST_MODIFIED ::= NULL_OR_REVISION
490
NULL_OR_REVISION ::= 'null:' | REVISION
491
REVISION ::= revision-id-in-utf8-no-whitespace
494
DELTA_LINES is lexicographically sorted.
496
Some explanation is in order. When NEWPATH is 'None' a delete has been
497
recorded, and because this inventory delta is not attempting to be a reversible
498
delta, the only other valid fields are OLDPATH and 'file-id'. PARENT_ID is ''
499
when a delete has been recorded or when recording a new root entry.
505
Inventory deltas and more broadly changes between trees are a significant part
506
of bzr's core operations: they are key components in status, diff, commit,
507
and merge (although merge uses tree transform, deltas contain the changes that
508
are applied to the transform). Our ability to perform a given operation depends
509
on us creating consistent deltas between trees. Inconsistent deltas lead to
510
errors and bugs, or even just unexpected conflicts.
512
An inventory delta is a transform to change an inventory A into another
513
inventory B (in patch terms its a perfect patch). Sometimes, for instance in a
514
regular commit, inventory B is known at the time we create the delta. Other
515
times, B is not known because the user is requesting that some parts of the
516
second inventory they have are masked out from consideration. When this happens
517
we create a delta that when applied to A creates a B we haven't seen in total
518
before. In this situation we need to ensure that B will be internally
519
consistent. Deltas are unidirectional, a delta(A, B) creates B from A, but
520
cannot be used to create A from B.
522
Deltas are expressed as a list of (oldpath, newpath, fileid, entry) tuples. The
523
fileid, entry elements are normative; the old and new paths are strong hints
524
but not currently guaranteed to be accurate. (This is a shame and something we
525
should tighten up). Deltas are required to list all removals explicitly -
526
removing the parent of an entry doesn't remove the entry.
528
Applying a delta to an inventory consists of:
529
- removing all fileids for which entry is None
530
- adding or replacing all other fileids
531
- detecting consistency errors
533
An interesting aspect of delta inconsistencies is when we notice them:
534
- Silent errors which our application logic misses
535
- Visible errors we catch during application, so bad data isn't stored in
538
The minimum safe level for our application logic would be to catch all errors
539
during application. Making generation never generate inconsistent deltas is
540
a seperate but necessary condition for robust code.
542
An inconsistent delta is one which:
543
- after application to an inventory the inventory is an impossible state.
544
- has the same fileid, or oldpath(not-None), or newpath(not-None) multiple
546
- has a fileid field different to the entry.fileid in the same item in the
548
- has an entry that is in an impossible state (e.g. a directory with a text
551
Forms of inventory inconsistency deltas can carry/cause:
552
- An entry newly introduced to a path without also removing or relocating any
553
existing entry at that path. (Duplicate paths)
554
- An entry whose parent id isn't present in the tree. (Missing parent).
555
- Having oldpath or newpath not be actual original path or resulting path.
557
- An entry whose parent is not a directory. (Under non-directory).
558
- An entry that is internally inconsistent.
559
- An entry that is already present in the tree (Duplicate id)
561
Known causes of inconsistency:
562
- A 'new' entry which the inventory already has - when this is a directory
563
even arbitrary file ids under the 'new' entry are more likely to collide on
565
- Removing a directory without recursively removing its children - causes
567
- Recording a change to an entry without including all changed entries found
568
following its parents up to and includin the root - can cause duplicate
569
paths, missing parents, wrong path, under non-directory.
571
Avoiding inconsistent deltas
572
----------------------------
574
The simplest thing is to never create partial deltas, as it is trivial to
575
be consistent when all data is examined every time. However users sometimes
576
want to specify a subset of the changes in their tree when they do an operation
577
which needs to create a delta - such as commit.
579
We have a choice about handling user requests that can generate inconsistent
580
deltas. We can alter or interpret the request in such a way that the delta will
581
be consistent, but perhaps larger than the user had intended. Or we can
582
identify problematic situations and abort, specifying to the user why we have
583
aborted and likely things they can do to make their request generate a
586
Currently we attempt to expand/interpret the request so that the user is not
587
required to understand all the internal constraints of the system: if they
588
request 'foo/bar' we automatically include foo. This works but can surprise
589
the user sometimes when things they didn't explicitly request are committed.
591
Different trees can use different algorithms to expand the request as long as
592
they produce consistent deltas. As part of getting a consistent UI we require
593
that all trees expand the paths requested downwards. Beyond that as long as
594
the delta is consistent it is up to the tree.
596
Given two trees, source and target, and a set of selected file ids to check for
597
changes and if changed in a delta between them, we have to expand that set by
598
the following rules, to get consistent deltas. The test for consistency is that
599
if the resulting delta is applied to source, to create a third tree 'output',
600
and the paths in the delta match the paths in source and output, only one file
601
id is at each path in output, and no file ids are missing parents, then the
604
Firstly, the parent ids to the root for all of the file ids that have actually
605
changed must be considered. Unless they are all examined the paths in the delta
608
Secondly, when an item included in the delta has a new path which is the same
609
as a path in source, the fileid of that path in source must be included.
610
Failing to do this leads to multiple ids tryin to share a path in output.
612
Thirdly, when an item changes its kind from 'directory' to anything else in the
613
delta, all of the direct children of the directory in source must be included.