~bzr-pqm/bzr/bzr.dev

0.1.1 by Martin Pool
Check in old existing knit code.
1
#! /usr/bin/python
2
3
# Copyright (C) 2005 Canonical Ltd
4
0.1.33 by Martin Pool
add gpl text
5
# This program is free software; you can redistribute it and/or modify
6
# it under the terms of the GNU General Public License as published by
7
# the Free Software Foundation; either version 2 of the License, or
8
# (at your option) any later version.
9
10
# This program is distributed in the hope that it will be useful,
11
# but WITHOUT ANY WARRANTY; without even the implied warranty of
12
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
13
# GNU General Public License for more details.
14
15
# You should have received a copy of the GNU General Public License
16
# along with this program; if not, write to the Free Software
17
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
0.1.1 by Martin Pool
Check in old existing knit code.
18
19
# Author: Martin Pool <mbp@canonical.com>
20
21
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
22
"""Weave - storage of related text file versions"""
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
23
928 by Martin Pool
- go back to using plain builtin set()
24
# before intset (r923) 2000 versions in 41.5s
25
# with intset (r926) 2000 versions in 93s !!!
26
# better to just use plain sets.
27
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
28
# making _extract build and return a list, rather than being a generator
29
# takes 37.94s
30
938 by Martin Pool
- various optimizations to weave add code
31
# with python -O, r923 does 2000 versions in 36.87s
32
33
# with optimizations to avoid mutating lists - 35.75!  I guess copying
34
# all the elements every time costs more than the small manipulations.
35
# a surprisingly small change.
36
37
# r931, which avoids using a generator for extract, does 36.98s
38
39
# with memoized inclusions, takes 41.49s; not very good
40
41
# with slots, takes 37.35s; without takes 39.16, a bit surprising
42
43
# with the delta calculation mixed in with the add method, rather than
44
# separated, takes 36.78s
45
46
# with delta folded in and mutation of the list, 36.13s
47
1079 by Martin Pool
- weavefile can just use lists for read-in ancestry, not frozensets
48
# with all this and simplification of add code, 33s
49
50
51
52
938 by Martin Pool
- various optimizations to weave add code
53
54
0.1.61 by Martin Pool
doc
55
# TODO: Perhaps have copy method for Weave instances?
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
56
0.1.58 by Martin Pool
doc
57
# XXX: If we do weaves this way, will a merge still behave the same
58
# way if it's done in a different order?  That's a pretty desirable
59
# property.
60
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
61
# TODO: Nothing here so far assumes the lines are really \n newlines,
62
# rather than being split up in some other way.  We could accomodate
63
# binaries, perhaps by naively splitting on \n or perhaps using
64
# something like a rolling checksum.
65
66
# TODO: Track version names as well as indexes. 
67
0.1.85 by Martin Pool
doc
68
# TODO: End marker for each version so we can stop reading?
0.1.69 by Martin Pool
Simple text-based format for storing weaves, cleaner than
69
70
# TODO: Check that no insertion occurs inside a deletion that was
71
# active in the version of the insertion.
72
912 by Martin Pool
- update todos for weave
73
# TODO: In addition to the SHA-1 check, perhaps have some code that
74
# checks structural constraints of the weave: ie that insertions are
75
# properly nested, that there is no text outside of an insertion, that
76
# insertions or deletions are not repeated, etc.
0.1.85 by Martin Pool
doc
77
918 by Martin Pool
- start doing new weave-merge algorithm
78
# TODO: Parallel-extract that passes back each line along with a
79
# description of which revisions include it.  Nice for checking all
80
# shas in parallel.
81
1082 by Martin Pool
- lift imports
82
# TODO: Using a single _extract routine and then processing the output
83
# is probably inefficient.  It's simple enough that we can afford to
84
# have slight specializations for different ways its used: annotate,
85
# basis for add, get, etc.
86
87
88
import sha
918 by Martin Pool
- start doing new weave-merge algorithm
89
0.1.85 by Martin Pool
doc
90
924 by Martin Pool
- Add IntSet class
91
0.1.47 by Martin Pool
New WeaveError and WeaveFormatError rather than assertions.
92
class WeaveError(Exception):
93
    """Exception in processing weave"""
94
95
96
class WeaveFormatError(WeaveError):
97
    """Weave invariant violated"""
98
    
99
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
100
class Weave(object):
101
    """weave - versioned text file storage.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
102
    
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
103
    A Weave manages versions of line-based text files, keeping track
104
    of the originating version for each line.
105
106
    To clients the "lines" of the file are represented as a list of strings.
107
    These strings  will typically have terminal newline characters, but
108
    this is not required.  In particular files commonly do not have a newline
109
    at the end of the file.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
110
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
111
    Texts can be identified in either of two ways:
112
113
    * a nonnegative index number.
114
1075 by Martin Pool
- don't store redundant version number at end of insert blocks
115
    * a version-id string. (not implemented yet)
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
116
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
117
    Typically the index number will be valid only inside this weave and
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
118
    the version-id is used to reference it in the larger world.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
119
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
120
    The weave is represented as a list mixing edit instructions and
944 by Martin Pool
- refactor member names in Weave code
121
    literal text.  Each entry in _weave can be either a string (or
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
122
    unicode), or a tuple.  If a string, it means that the given line
123
    should be output in the currently active revisions.
124
125
    If a tuple, it gives a processing instruction saying in which
126
    revisions the enclosed lines are active.  The tuple has the form
127
    (instruction, version).
128
129
    The instruction can be '{' or '}' for an insertion block, and '['
130
    and ']' for a deletion block respectively.  The version is the
0.1.45 by Martin Pool
doc
131
    integer version index.  There is no replace operator, only deletes
1075 by Martin Pool
- don't store redundant version number at end of insert blocks
132
    and inserts.  For '}', the end of an insertion, there is no
133
    version parameter because it always closes the most recently
134
    opened insertion.
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
135
0.1.41 by Martin Pool
Doc
136
    Constraints/notes:
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
137
138
    * A later version can delete lines that were introduced by any
139
      number of ancestor versions; this implies that deletion
140
      instructions can span insertion blocks without regard to the
141
      insertion block's nesting.
142
0.1.41 by Martin Pool
Doc
143
    * Similarly, deletions need not be properly nested with regard to
144
      each other, because they might have been generated by
145
      independent revisions.
146
0.1.45 by Martin Pool
doc
147
    * Insertions are always made by inserting a new bracketed block
148
      into a single point in the previous weave.  This implies they
149
      can nest but not overlap, and the nesting must always have later
150
      insertions on the inside.
151
0.1.41 by Martin Pool
Doc
152
    * It doesn't seem very useful to have an active insertion
153
      inside an inactive insertion, but it might happen.
0.1.45 by Martin Pool
doc
154
      
0.1.41 by Martin Pool
Doc
155
    * Therefore, all instructions are always"considered"; that
156
      is passed onto and off the stack.  An outer inactive block
157
      doesn't disable an inner block.
158
159
    * Lines are enabled if the most recent enclosing insertion is
160
      active and none of the enclosing deletions are active.
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
161
0.1.49 by Martin Pool
Add another constraint: revisions should not delete text that they
162
    * There is no point having a deletion directly inside its own
163
      insertion; you might as well just not write it.  And there
164
      should be no way to get an earlier version deleting a later
165
      version.
166
944 by Martin Pool
- refactor member names in Weave code
167
    _weave
168
        Text of the weave; list of control instruction tuples and strings.
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
169
944 by Martin Pool
- refactor member names in Weave code
170
    _parents
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
171
        List of parents, indexed by version number.
172
        It is only necessary to store the minimal set of parents for
173
        each version; the parent's parents are implied.
0.1.13 by Martin Pool
Knit structure now allows for versions to include the lines present in other
174
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
175
    _sha1s
176
        List of hex SHA-1 of each version, or None if not recorded.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
177
    """
938 by Martin Pool
- various optimizations to weave add code
178
944 by Martin Pool
- refactor member names in Weave code
179
    __slots__ = ['_weave', '_parents', '_sha1s']
938 by Martin Pool
- various optimizations to weave add code
180
    
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
181
    def __init__(self):
944 by Martin Pool
- refactor member names in Weave code
182
        self._weave = []
183
        self._parents = []
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
184
        self._sha1s = []
0.1.60 by Martin Pool
Weave eq and ne methods
185
186
187
    def __eq__(self, other):
188
        if not isinstance(other, Weave):
189
            return False
944 by Martin Pool
- refactor member names in Weave code
190
        return self._parents == other._parents \
191
               and self._weave == other._weave
0.1.60 by Martin Pool
Weave eq and ne methods
192
    
193
194
    def __ne__(self, other):
195
        return not self.__eq__(other)
196
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
197
        
0.1.26 by Martin Pool
Refactor parameters to add command
198
    def add(self, parents, text):
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
199
        """Add a single text on top of the weave.
0.1.36 by Martin Pool
doc
200
  
0.1.26 by Martin Pool
Refactor parameters to add command
201
        Returns the index number of the newly added version.
202
203
        parents
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
204
            List or set of direct parent version numbers.
205
            
0.1.26 by Martin Pool
Refactor parameters to add command
206
        text
207
            Sequence of lines to be added in the new version."""
938 by Martin Pool
- various optimizations to weave add code
208
209
        self._check_versions(parents)
0.1.82 by Martin Pool
Small weave optimizations
210
        ## self._check_lines(text)
944 by Martin Pool
- refactor member names in Weave code
211
        new_version = len(self._parents)
0.1.5 by Martin Pool
Add test for storing two text versions.
212
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
213
        s = sha.new()
938 by Martin Pool
- various optimizations to weave add code
214
        map(s.update, text)
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
215
        sha1 = s.hexdigest()
216
        del s
217
938 by Martin Pool
- various optimizations to weave add code
218
        # if we abort after here the weave will be corrupt
944 by Martin Pool
- refactor member names in Weave code
219
        self._parents.append(frozenset(parents))
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
220
        self._sha1s.append(sha1)
938 by Martin Pool
- various optimizations to weave add code
221
222
            
223
        if not parents:
224
            # special case; adding with no parents revision; can do
225
            # this more quickly by just appending unconditionally.
226
            # even more specially, if we're adding an empty text we
227
            # need do nothing at all.
228
            if text:
944 by Martin Pool
- refactor member names in Weave code
229
                self._weave.append(('{', new_version))
230
                self._weave.extend(text)
1075 by Martin Pool
- don't store redundant version number at end of insert blocks
231
                self._weave.append(('}', None))
938 by Martin Pool
- various optimizations to weave add code
232
        
233
            return new_version
234
941 by Martin Pool
- allow for parents specified to Weave.add to be a set
235
        if len(parents) == 1:
236
            pv = list(parents)[0]
237
            if sha1 == self._sha1s[pv]:
238
                # special case: same as the single parent
239
                return new_version
938 by Martin Pool
- various optimizations to weave add code
240
            
241
242
        ancestors = self.inclusions(parents)
243
944 by Martin Pool
- refactor member names in Weave code
244
        l = self._weave
938 by Martin Pool
- various optimizations to weave add code
245
246
        # basis a list of (origin, lineno, line)
247
        basis_lineno = []
248
        basis_lines = []
249
        for origin, lineno, line in self._extract(ancestors):
250
            basis_lineno.append(lineno)
251
            basis_lines.append(line)
252
1042 by Martin Pool
- more statistics output from 'weave stats' command
253
        # another small special case: a merge, producing the same text
254
        # as auto-merge
938 by Martin Pool
- various optimizations to weave add code
255
        if text == basis_lines:
256
            return new_version            
257
258
        # add a sentinal, because we can also match against the final line
944 by Martin Pool
- refactor member names in Weave code
259
        basis_lineno.append(len(self._weave))
938 by Martin Pool
- various optimizations to weave add code
260
261
        # XXX: which line of the weave should we really consider
262
        # matches the end of the file?  the current code says it's the
263
        # last line of the weave?
264
265
        #print 'basis_lines:', basis_lines
266
        #print 'new_lines:  ', lines
267
268
        from difflib import SequenceMatcher
269
        s = SequenceMatcher(None, basis_lines, text)
270
271
        # offset gives the number of lines that have been inserted
272
        # into the weave up to the current point; if the original edit instruction
273
        # says to change line A then we actually change (A+offset)
274
        offset = 0
275
276
        for tag, i1, i2, j1, j2 in s.get_opcodes():
277
            # i1,i2 are given in offsets within basis_lines; we need to map them
278
            # back to offsets within the entire weave
279
            #print 'raw match', tag, i1, i2, j1, j2
280
            if tag == 'equal':
281
                continue
282
283
            i1 = basis_lineno[i1]
284
            i2 = basis_lineno[i2]
285
286
            assert 0 <= j1 <= j2 <= len(text)
287
288
            #print tag, i1, i2, j1, j2
289
290
            # the deletion and insertion are handled separately.
291
            # first delete the region.
292
            if i1 != i2:
944 by Martin Pool
- refactor member names in Weave code
293
                self._weave.insert(i1+offset, ('[', new_version))
294
                self._weave.insert(i2+offset+1, (']', new_version))
938 by Martin Pool
- various optimizations to weave add code
295
                offset += 2
296
297
            if j1 != j2:
298
                # there may have been a deletion spanning up to
299
                # i2; we want to insert after this region to make sure
300
                # we don't destroy ourselves
301
                i = i2 + offset
944 by Martin Pool
- refactor member names in Weave code
302
                self._weave[i:i] = ([('{', new_version)] 
1075 by Martin Pool
- don't store redundant version number at end of insert blocks
303
                                    + text[j1:j2] 
304
                                    + [('}', None)])
938 by Martin Pool
- various optimizations to weave add code
305
                offset += 2 + (j2 - j1)
306
307
        return new_version
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
308
0.1.27 by Martin Pool
Check that version numbers passed in are reasonable
309
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
310
    def inclusions(self, versions):
893 by Martin Pool
- Refactor weave calculation of inclusions
311
        """Return set of all ancestors of given version(s)."""
928 by Martin Pool
- go back to using plain builtin set()
312
        i = set(versions)
893 by Martin Pool
- Refactor weave calculation of inclusions
313
        v = max(versions)
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
314
        try:
893 by Martin Pool
- Refactor weave calculation of inclusions
315
            while v >= 0:
316
                if v in i:
317
                    # include all its parents
944 by Martin Pool
- refactor member names in Weave code
318
                    i.update(self._parents[v])
893 by Martin Pool
- Refactor weave calculation of inclusions
319
                v -= 1
320
            return i
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
321
        except IndexError:
322
            raise ValueError("version %d not present in weave" % v)
0.1.77 by Martin Pool
New Weave.get_included() does transitive expansion
323
324
890 by Martin Pool
- weave info should show minimal expression of parents
325
    def minimal_parents(self, version):
326
        """Find the minimal set of parents for the version."""
944 by Martin Pool
- refactor member names in Weave code
327
        included = self._parents[version]
890 by Martin Pool
- weave info should show minimal expression of parents
328
        if not included:
329
            return []
330
        
331
        li = list(included)
893 by Martin Pool
- Refactor weave calculation of inclusions
332
        li.sort(reverse=True)
890 by Martin Pool
- weave info should show minimal expression of parents
333
334
        mininc = []
928 by Martin Pool
- go back to using plain builtin set()
335
        gotit = set()
890 by Martin Pool
- weave info should show minimal expression of parents
336
337
        for pv in li:
338
            if pv not in gotit:
339
                mininc.append(pv)
893 by Martin Pool
- Refactor weave calculation of inclusions
340
                gotit.update(self.inclusions(pv))
890 by Martin Pool
- weave info should show minimal expression of parents
341
342
        assert mininc[0] >= 0
343
        assert mininc[-1] < version
344
        return mininc
345
346
0.1.75 by Martin Pool
Remove VerInfo class; just store sets directly in the list of
347
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
348
    def _check_lines(self, text):
349
        if not isinstance(text, list):
350
            raise ValueError("text should be a list, not %s" % type(text))
351
352
        for l in text:
353
            if not isinstance(l, basestring):
869 by Martin Pool
- more weave.py command line options
354
                raise ValueError("text line should be a string or unicode, not %s"
355
                                 % type(l))
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
356
        
357
358
0.1.27 by Martin Pool
Check that version numbers passed in are reasonable
359
    def _check_versions(self, indexes):
360
        """Check everything in the sequence of indexes is valid"""
361
        for i in indexes:
362
            try:
944 by Martin Pool
- refactor member names in Weave code
363
                self._parents[i]
0.1.27 by Martin Pool
Check that version numbers passed in are reasonable
364
            except IndexError:
365
                raise IndexError("invalid version number %r" % i)
366
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
367
    
0.1.7 by Martin Pool
Add trivial annotate text
368
    def annotate(self, index):
369
        return list(self.annotate_iter(index))
370
371
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
372
    def annotate_iter(self, version):
0.1.7 by Martin Pool
Add trivial annotate text
373
        """Yield list of (index-id, line) pairs for the specified version.
374
375
        The index indicates when the line originated in the weave."""
893 by Martin Pool
- Refactor weave calculation of inclusions
376
        for origin, lineno, text in self._extract([version]):
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
377
            yield origin, text
0.1.22 by Martin Pool
Calculate delta for new versions relative to a set of parent versions.
378
379
918 by Martin Pool
- start doing new weave-merge algorithm
380
    def _walk(self):
381
        """Walk the weave.
382
383
        Yields sequence of
384
        (lineno, insert, deletes, text)
385
        for each literal line.
386
        """
387
        
388
        istack = []
928 by Martin Pool
- go back to using plain builtin set()
389
        dset = set()
918 by Martin Pool
- start doing new weave-merge algorithm
390
391
        lineno = 0         # line of weave, 0-based
392
944 by Martin Pool
- refactor member names in Weave code
393
        for l in self._weave:
918 by Martin Pool
- start doing new weave-merge algorithm
394
            if isinstance(l, tuple):
395
                c, v = l
396
                isactive = None
397
                if c == '{':
398
                    istack.append(v)
399
                elif c == '}':
1075 by Martin Pool
- don't store redundant version number at end of insert blocks
400
                    istack.pop()
918 by Martin Pool
- start doing new weave-merge algorithm
401
                elif c == '[':
926 by Martin Pool
- update more weave code to use intsets
402
                    assert v not in dset
403
                    dset.add(v)
918 by Martin Pool
- start doing new weave-merge algorithm
404
                elif c == ']':
926 by Martin Pool
- update more weave code to use intsets
405
                    dset.remove(v)
918 by Martin Pool
- start doing new weave-merge algorithm
406
                else:
407
                    raise WeaveFormatError('unexpected instruction %r'
408
                                           % v)
409
            else:
410
                assert isinstance(l, basestring)
411
                assert istack
412
                yield lineno, istack[-1], dset, l
413
            lineno += 1
414
415
416
893 by Martin Pool
- Refactor weave calculation of inclusions
417
    def _extract(self, versions):
0.1.20 by Martin Pool
Factor out Knit.extract() method
418
        """Yield annotation of lines in included set.
419
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
420
        Yields a sequence of tuples (origin, lineno, text), where
421
        origin is the origin version, lineno the index in the weave,
422
        and text the text of the line.
423
0.1.20 by Martin Pool
Factor out Knit.extract() method
424
        The set typically but not necessarily corresponds to a version.
425
        """
893 by Martin Pool
- Refactor weave calculation of inclusions
426
        included = self.inclusions(versions)
881 by Martin Pool
- faster weave extraction
427
428
        istack = []
928 by Martin Pool
- go back to using plain builtin set()
429
        dset = set()
0.1.48 by Martin Pool
Basic parsing of delete instructions.
430
431
        lineno = 0         # line of weave, 0-based
891 by Martin Pool
- fix up refactoring of weave
432
894 by Martin Pool
- small optimization for weave extract
433
        isactive = None
0.1.85 by Martin Pool
doc
434
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
435
        result = []
436
0.1.63 by Martin Pool
Abbreviate WeaveFormatError in some code
437
        WFE = WeaveFormatError
0.1.95 by Martin Pool
- preliminary merge conflict detection
438
944 by Martin Pool
- refactor member names in Weave code
439
        for l in self._weave:
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
440
            if isinstance(l, tuple):
441
                c, v = l
894 by Martin Pool
- small optimization for weave extract
442
                isactive = None
891 by Martin Pool
- fix up refactoring of weave
443
                if c == '{':
444
                    assert v not in istack
445
                    istack.append(v)
446
                elif c == '}':
1075 by Martin Pool
- don't store redundant version number at end of insert blocks
447
                    istack.pop()
891 by Martin Pool
- fix up refactoring of weave
448
                elif c == '[':
449
                    if v in included:
881 by Martin Pool
- faster weave extraction
450
                        assert v not in dset
0.1.48 by Martin Pool
Basic parsing of delete instructions.
451
                        dset.add(v)
891 by Martin Pool
- fix up refactoring of weave
452
                else:
453
                    assert c == ']'
454
                    if v in included:
881 by Martin Pool
- faster weave extraction
455
                        assert v in dset
0.1.48 by Martin Pool
Basic parsing of delete instructions.
456
                        dset.remove(v)
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
457
            else:
458
                assert isinstance(l, basestring)
894 by Martin Pool
- small optimization for weave extract
459
                if isactive is None:
460
                    isactive = (not dset) and istack and (istack[-1] in included)
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
461
                if isactive:
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
462
                    result.append((istack[-1], lineno, l))
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
463
            lineno += 1
0.1.7 by Martin Pool
Add trivial annotate text
464
0.1.46 by Martin Pool
More constraints on structure of weave, and checks that they work
465
        if istack:
0.1.63 by Martin Pool
Abbreviate WeaveFormatError in some code
466
            raise WFE("unclosed insertion blocks at end of weave",
0.1.47 by Martin Pool
New WeaveError and WeaveFormatError rather than assertions.
467
                                   istack)
0.1.48 by Martin Pool
Basic parsing of delete instructions.
468
        if dset:
0.1.63 by Martin Pool
Abbreviate WeaveFormatError in some code
469
            raise WFE("unclosed deletion blocks at end of weave",
0.1.48 by Martin Pool
Basic parsing of delete instructions.
470
                                   dset)
0.1.40 by Martin Pool
Add test for extracting from weave with nested insertions
471
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
472
        return result
473
    
474
0.1.7 by Martin Pool
Add trivial annotate text
475
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
476
    def get_iter(self, version):
0.1.5 by Martin Pool
Add test for storing two text versions.
477
        """Yield lines for the specified version."""
893 by Martin Pool
- Refactor weave calculation of inclusions
478
        for origin, lineno, line in self._extract([version]):
0.1.8 by Martin Pool
Unify get/annotate code
479
            yield line
0.1.5 by Martin Pool
Add test for storing two text versions.
480
481
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
482
    def get(self, index):
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
483
        return list(self.get_iter(index))
0.1.1 by Martin Pool
Check in old existing knit code.
484
485
0.1.95 by Martin Pool
- preliminary merge conflict detection
486
    def mash_iter(self, included):
0.1.65 by Martin Pool
Add Weave.merge_iter to get automerged lines
487
        """Return composed version of multiple included versions."""
893 by Martin Pool
- Refactor weave calculation of inclusions
488
        for origin, lineno, text in self._extract(included):
0.1.65 by Martin Pool
Add Weave.merge_iter to get automerged lines
489
            yield text
490
491
0.1.11 by Martin Pool
Add Knit.dump method
492
    def dump(self, to_file):
493
        from pprint import pprint
944 by Martin Pool
- refactor member names in Weave code
494
        print >>to_file, "Weave._weave = ",
495
        pprint(self._weave, to_file)
496
        print >>to_file, "Weave._parents = ",
497
        pprint(self._parents, to_file)
0.1.11 by Martin Pool
Add Knit.dump method
498
499
0.1.91 by Martin Pool
Update Weave.check
500
501
    def numversions(self):
944 by Martin Pool
- refactor member names in Weave code
502
        l = len(self._parents)
0.1.91 by Martin Pool
Update Weave.check
503
        assert l == len(self._sha1s)
504
        return l
505
506
946 by Martin Pool
- weave info only shows the weave headers, doesn't extract every version:
507
    def __len__(self):
508
        return self.numversions()
509
510
894 by Martin Pool
- small optimization for weave extract
511
    def check(self, progress_bar=None):
0.1.91 by Martin Pool
Update Weave.check
512
        # check no circular inclusions
513
        for version in range(self.numversions()):
944 by Martin Pool
- refactor member names in Weave code
514
            inclusions = list(self._parents[version])
0.1.91 by Martin Pool
Update Weave.check
515
            if inclusions:
516
                inclusions.sort()
517
                if inclusions[-1] >= version:
0.1.47 by Martin Pool
New WeaveError and WeaveFormatError rather than assertions.
518
                    raise WeaveFormatError("invalid included version %d for index %d"
0.1.91 by Martin Pool
Update Weave.check
519
                                           % (inclusions[-1], version))
520
521
        # try extracting all versions; this is a bit slow and parallel
522
        # extraction could be used
894 by Martin Pool
- small optimization for weave extract
523
        nv = self.numversions()
524
        for version in range(nv):
525
            if progress_bar:
526
                progress_bar.update('checking text', version, nv)
0.1.91 by Martin Pool
Update Weave.check
527
            s = sha.new()
528
            for l in self.get_iter(version):
529
                s.update(l)
530
            hd = s.hexdigest()
531
            expected = self._sha1s[version]
532
            if hd != expected:
533
                raise WeaveError("mismatched sha1 for version %d; "
534
                                 "got %s, expected %s"
535
                                 % (version, hd, expected))
0.1.18 by Martin Pool
Better Knit.dump method
536
881 by Martin Pool
- faster weave extraction
537
        # TODO: check insertions are properly nested, that there are
538
        # no lines outside of insertion blocks, that deletions are
539
        # properly paired, etc.
540
0.1.13 by Martin Pool
Knit structure now allows for versions to include the lines present in other
541
542
0.1.95 by Martin Pool
- preliminary merge conflict detection
543
    def merge(self, merge_versions):
544
        """Automerge and mark conflicts between versions.
545
546
        This returns a sequence, each entry describing alternatives
547
        for a chunk of the file.  Each of the alternatives is given as
548
        a list of lines.
549
550
        If there is a chunk of the file where there's no diagreement,
551
        only one alternative is given.
552
        """
553
554
        # approach: find the included versions common to all the
555
        # merged versions
556
        raise NotImplementedError()
557
558
559
0.1.21 by Martin Pool
Start computing a delta to insert a new revision
560
    def _delta(self, included, lines):
561
        """Return changes from basis to new revision.
562
563
        The old text for comparison is the union of included revisions.
564
565
        This is used in inserting a new text.
0.1.22 by Martin Pool
Calculate delta for new versions relative to a set of parent versions.
566
0.1.55 by Martin Pool
doc
567
        Delta is returned as a sequence of
568
        (weave1, weave2, newlines).
569
570
        This indicates that weave1:weave2 of the old weave should be
0.1.22 by Martin Pool
Calculate delta for new versions relative to a set of parent versions.
571
        replaced by the sequence of lines in newlines.  Note that
572
        these line numbers are positions in the total weave and don't
573
        correspond to the lines in any extracted version, or even the
574
        extracted union of included versions.
575
576
        If line1=line2, this is a pure insert; if newlines=[] this is a
577
        pure delete.  (Similar to difflib.)
0.1.21 by Martin Pool
Start computing a delta to insert a new revision
578
        """
579
0.1.1 by Martin Pool
Check in old existing knit code.
580
918 by Martin Pool
- start doing new weave-merge algorithm
581
            
582
    def plan_merge(self, ver_a, ver_b):
583
        """Return pseudo-annotation indicating how the two versions merge.
584
585
        This is computed between versions a and b and their common
586
        base.
587
588
        Weave lines present in none of them are skipped entirely.
589
        """
926 by Martin Pool
- update more weave code to use intsets
590
        inc_a = self.inclusions([ver_a])
591
        inc_b = self.inclusions([ver_b])
918 by Martin Pool
- start doing new weave-merge algorithm
592
        inc_c = inc_a & inc_b
593
594
        for lineno, insert, deleteset, line in self._walk():
595
            if deleteset & inc_c:
596
                # killed in parent; can't be in either a or b
597
                # not relevant to our work
598
                yield 'killed-base', line
926 by Martin Pool
- update more weave code to use intsets
599
            elif insert in inc_c:
918 by Martin Pool
- start doing new weave-merge algorithm
600
                # was inserted in base
601
                killed_a = bool(deleteset & inc_a)
602
                killed_b = bool(deleteset & inc_b)
603
                if killed_a and killed_b:
604
                    yield 'killed-both', line
605
                elif killed_a:
606
                    yield 'killed-a', line
607
                elif killed_b:
608
                    yield 'killed-b', line
609
                else:
610
                    yield 'unchanged', line
926 by Martin Pool
- update more weave code to use intsets
611
            elif insert in inc_a:
918 by Martin Pool
- start doing new weave-merge algorithm
612
                if deleteset & inc_a:
613
                    yield 'ghost-a', line
614
                else:
615
                    # new in A; not in B
616
                    yield 'new-a', line
926 by Martin Pool
- update more weave code to use intsets
617
            elif insert in inc_b:
918 by Martin Pool
- start doing new weave-merge algorithm
618
                if deleteset & inc_b:
619
                    yield 'ghost-b', line
620
                else:
621
                    yield 'new-b', line
622
            else:
623
                # not in either revision
624
                yield 'irrelevant', line
625
919 by Martin Pool
- more development of weave-merge
626
        yield 'unchanged', ''           # terminator
627
628
629
630
    def weave_merge(self, plan):
631
        lines_a = []
632
        lines_b = []
633
        ch_a = ch_b = False
634
635
        for state, line in plan:
636
            if state == 'unchanged' or state == 'killed-both':
637
                # resync and flush queued conflicts changes if any
638
                if not lines_a and not lines_b:
639
                    pass
640
                elif ch_a and not ch_b:
641
                    # one-sided change:                    
642
                    for l in lines_a: yield l
643
                elif ch_b and not ch_a:
644
                    for l in lines_b: yield l
645
                elif lines_a == lines_b:
646
                    for l in lines_a: yield l
647
                else:
648
                    yield '<<<<\n'
649
                    for l in lines_a: yield l
650
                    yield '====\n'
651
                    for l in lines_b: yield l
652
                    yield '>>>>\n'
653
654
                del lines_a[:]
655
                del lines_b[:]
656
                ch_a = ch_b = False
657
                
658
            if state == 'unchanged':
659
                if line:
660
                    yield line
661
            elif state == 'killed-a':
662
                ch_a = True
663
                lines_b.append(line)
664
            elif state == 'killed-b':
665
                ch_b = True
666
                lines_a.append(line)
667
            elif state == 'new-a':
668
                ch_a = True
669
                lines_a.append(line)
670
            elif state == 'new-b':
671
                ch_b = True
672
                lines_b.append(line)
673
            else:
920 by Martin Pool
- add more test cases for weave_merge
674
                assert state in ('irrelevant', 'ghost-a', 'ghost-b', 'killed-base',
675
                                 'killed-both'), \
919 by Martin Pool
- more development of weave-merge
676
                       state
677
678
                
679
680
918 by Martin Pool
- start doing new weave-merge algorithm
681
682
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
683
1081 by Martin Pool
- if weave tool is invoked with no arguments, show help
684
def weave_toc(w):
685
    """Show the weave's table-of-contents"""
946 by Martin Pool
- weave info only shows the weave headers, doesn't extract every version:
686
    print '%6s %40s %20s' % ('ver', 'sha1', 'parents')
687
    for i in (6, 40, 20):
870 by Martin Pool
- better weave info display
688
        print '-' * i,
689
    print
946 by Martin Pool
- weave info only shows the weave headers, doesn't extract every version:
690
    for i in range(w.numversions()):
0.1.91 by Martin Pool
Update Weave.check
691
        sha1 = w._sha1s[i]
946 by Martin Pool
- weave info only shows the weave headers, doesn't extract every version:
692
        print '%6d %40s %s' % (i, sha1, ' '.join(map(str, w._parents[i])))
0.1.88 by Martin Pool
Add weave info command.
693
869 by Martin Pool
- more weave.py command line options
694
695
947 by Martin Pool
- new 'weave stats' command
696
def weave_stats(weave_file):
697
    from bzrlib.progress import ProgressBar
698
    from bzrlib.weavefile import read_weave
699
700
    pb = ProgressBar()
701
702
    wf = file(weave_file, 'rb')
703
    w = read_weave(wf)
704
    # FIXME: doesn't work on pipes
705
    weave_size = wf.tell()
706
707
    total = 0
708
    vers = len(w)
709
    for i in range(vers):
710
        pb.update('checking sizes', i, vers)
711
        for line in w.get_iter(i):
712
            total += len(line)
713
714
    pb.clear()
715
716
    print 'versions          %9d' % vers
717
    print 'weave file        %9d bytes' % weave_size
718
    print 'total contents    %9d bytes' % total
719
    print 'compression ratio %9.2fx' % (float(total) / float(weave_size))
1042 by Martin Pool
- more statistics output from 'weave stats' command
720
    if vers:
721
        avg = total/vers
722
        print 'average size      %9d bytes' % avg
723
        print 'relative size     %9.2fx' % (float(weave_size) / float(avg))
947 by Martin Pool
- new 'weave stats' command
724
725
869 by Martin Pool
- more weave.py command line options
726
def usage():
871 by Martin Pool
- add command for merge-based weave
727
    print """bzr weave tool
728
729
Experimental tool for weave algorithm.
730
869 by Martin Pool
- more weave.py command line options
731
usage:
732
    weave init WEAVEFILE
733
        Create an empty weave file
734
    weave get WEAVEFILE VERSION
735
        Write out specified version.
736
    weave check WEAVEFILE
737
        Check consistency of all versions.
1081 by Martin Pool
- if weave tool is invoked with no arguments, show help
738
    weave toc WEAVEFILE
869 by Martin Pool
- more weave.py command line options
739
        Display table of contents.
740
    weave add WEAVEFILE [BASE...] < NEWTEXT
741
        Add NEWTEXT, with specified parent versions.
742
    weave annotate WEAVEFILE VERSION
743
        Display origin of each line.
744
    weave mash WEAVEFILE VERSION...
745
        Display composite of all selected versions.
746
    weave merge WEAVEFILE VERSION1 VERSION2 > OUT
747
        Auto-merge two versions and display conflicts.
871 by Martin Pool
- add command for merge-based weave
748
749
example:
750
751
    % weave init foo.weave
752
    % vi foo.txt
753
    % weave add foo.weave < foo.txt
754
    added version 0
755
756
    (create updated version)
757
    % vi foo.txt
758
    % weave get foo.weave 0 | diff -u - foo.txt
759
    % weave add foo.weave 0 < foo.txt
760
    added version 1
761
762
    % weave get foo.weave 0 > foo.txt       (create forked version)
763
    % vi foo.txt
764
    % weave add foo.weave 0 < foo.txt
765
    added version 2
766
767
    % weave merge foo.weave 1 2 > foo.txt   (merge them)
768
    % vi foo.txt                            (resolve conflicts)
769
    % weave add foo.weave 1 2 < foo.txt     (commit merged version)     
770
    
869 by Martin Pool
- more weave.py command line options
771
"""
0.1.88 by Martin Pool
Add weave info command.
772
    
773
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
774
775
def main(argv):
776
    import sys
777
    import os
869 by Martin Pool
- more weave.py command line options
778
    from weavefile import write_weave, read_weave
894 by Martin Pool
- small optimization for weave extract
779
    from bzrlib.progress import ProgressBar
780
1078 by Martin Pool
- use psyco for weave if possible
781
    try:
782
        import psyco
783
        psyco.full()
784
    except ImportError:
785
        pass
894 by Martin Pool
- small optimization for weave extract
786
1081 by Martin Pool
- if weave tool is invoked with no arguments, show help
787
    if len(argv) < 2:
788
        usage()
789
        return 0
790
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
791
    cmd = argv[1]
869 by Martin Pool
- more weave.py command line options
792
793
    def readit():
794
        return read_weave(file(argv[2], 'rb'))
795
    
796
    if cmd == 'help':
797
        usage()
798
    elif cmd == 'add':
799
        w = readit()
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
800
        # at the moment, based on everything in the file
869 by Martin Pool
- more weave.py command line options
801
        parents = map(int, argv[3:])
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
802
        lines = sys.stdin.readlines()
0.1.69 by Martin Pool
Simple text-based format for storing weaves, cleaner than
803
        ver = w.add(parents, lines)
869 by Martin Pool
- more weave.py command line options
804
        write_weave(w, file(argv[2], 'wb'))
805
        print 'added version %d' % ver
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
806
    elif cmd == 'init':
807
        fn = argv[2]
808
        if os.path.exists(fn):
809
            raise IOError("file exists")
810
        w = Weave()
869 by Martin Pool
- more weave.py command line options
811
        write_weave(w, file(fn, 'wb'))
812
    elif cmd == 'get': # get one version
813
        w = readit()
0.1.94 by Martin Pool
Fix get_iter call
814
        sys.stdout.writelines(w.get_iter(int(argv[3])))
869 by Martin Pool
- more weave.py command line options
815
        
816
    elif cmd == 'mash': # get composite
817
        w = readit()
818
        sys.stdout.writelines(w.mash_iter(map(int, argv[3:])))
819
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
820
    elif cmd == 'annotate':
869 by Martin Pool
- more weave.py command line options
821
        w = readit()
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
822
        # newline is added to all lines regardless; too hard to get
823
        # reasonable formatting otherwise
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
824
        lasto = None
825
        for origin, text in w.annotate(int(argv[3])):
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
826
            text = text.rstrip('\r\n')
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
827
            if origin == lasto:
828
                print '      | %s' % (text)
829
            else:
830
                print '%5d | %s' % (origin, text)
831
                lasto = origin
871 by Martin Pool
- add command for merge-based weave
832
                
1081 by Martin Pool
- if weave tool is invoked with no arguments, show help
833
    elif cmd == 'toc':
834
        weave_toc(readit())
947 by Martin Pool
- new 'weave stats' command
835
836
    elif cmd == 'stats':
837
        weave_stats(argv[2])
871 by Martin Pool
- add command for merge-based weave
838
        
0.1.91 by Martin Pool
Update Weave.check
839
    elif cmd == 'check':
869 by Martin Pool
- more weave.py command line options
840
        w = readit()
894 by Martin Pool
- small optimization for weave extract
841
        pb = ProgressBar()
842
        w.check(pb)
843
        pb.clear()
938 by Martin Pool
- various optimizations to weave add code
844
        print '%d versions ok' % w.numversions()
871 by Martin Pool
- add command for merge-based weave
845
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
846
    elif cmd == 'inclusions':
847
        w = readit()
848
        print ' '.join(map(str, w.inclusions([int(argv[3])])))
849
850
    elif cmd == 'parents':
851
        w = readit()
944 by Martin Pool
- refactor member names in Weave code
852
        print ' '.join(map(str, w._parents[int(argv[3])]))
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
853
918 by Martin Pool
- start doing new weave-merge algorithm
854
    elif cmd == 'plan-merge':
855
        w = readit()
856
        for state, line in w.plan_merge(int(argv[3]), int(argv[4])):
919 by Martin Pool
- more development of weave-merge
857
            if line:
858
                print '%14s | %s' % (state, line),
918 by Martin Pool
- start doing new weave-merge algorithm
859
871 by Martin Pool
- add command for merge-based weave
860
    elif cmd == 'merge':
919 by Martin Pool
- more development of weave-merge
861
        w = readit()
862
        p = w.plan_merge(int(argv[3]), int(argv[4]))
863
        sys.stdout.writelines(w.weave_merge(p))
864
            
865
    elif cmd == 'mash-merge':
871 by Martin Pool
- add command for merge-based weave
866
        if len(argv) != 5:
867
            usage()
868
            return 1
869
870
        w = readit()
871
        v1, v2 = map(int, argv[3:5])
872
873
        basis = w.inclusions([v1]).intersection(w.inclusions([v2]))
874
875
        base_lines = list(w.mash_iter(basis))
876
        a_lines = list(w.get(v1))
877
        b_lines = list(w.get(v2))
878
879
        from bzrlib.merge3 import Merge3
880
        m3 = Merge3(base_lines, a_lines, b_lines)
881
882
        name_a = 'version %d' % v1
883
        name_b = 'version %d' % v2
884
        sys.stdout.writelines(m3.merge_lines(name_a=name_a, name_b=name_b))
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
885
    else:
886
        raise ValueError('unknown command %r' % cmd)
887
    
888
1076 by Martin Pool
- add code to run weave utility under profiler
889
890
def profile_main(argv): 
891
    import tempfile, hotshot, hotshot.stats
892
893
    prof_f = tempfile.NamedTemporaryFile()
894
895
    prof = hotshot.Profile(prof_f.name)
896
897
    ret = prof.runcall(main, argv)
898
    prof.close()
899
900
    stats = hotshot.stats.load(prof_f.name)
901
    #stats.strip_dirs()
1079 by Martin Pool
- weavefile can just use lists for read-in ancestry, not frozensets
902
    stats.sort_stats('cumulative')
1076 by Martin Pool
- add code to run weave utility under profiler
903
    ## XXX: Might like to write to stderr or the trace file instead but
904
    ## print_stats seems hardcoded to stdout
905
    stats.print_stats(20)
906
            
907
    return ret
908
909
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
910
if __name__ == '__main__':
911
    import sys
1081 by Martin Pool
- if weave tool is invoked with no arguments, show help
912
    if '--profile' in sys.argv:
1078 by Martin Pool
- use psyco for weave if possible
913
        args = sys.argv[:]
1081 by Martin Pool
- if weave tool is invoked with no arguments, show help
914
        args.remove('--profile')
1078 by Martin Pool
- use psyco for weave if possible
915
        sys.exit(profile_main(args))
916
    else:
917
        sys.exit(main(sys.argv))
1076 by Martin Pool
- add code to run weave utility under profiler
918