~bzr-pqm/bzr/bzr.dev

3640.2.4 by John Arbash Meinel
Copyright updates
1
# Copyright (C) 2007, 2008 Canonical Ltd
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
17
"""Helper functions for DirState.
18
19
This is the python implementation for DirState functions.
20
"""
21
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
22
from bzrlib import errors
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
23
from bzrlib.dirstate import DirState
24
25
2474.1.72 by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments
26
# Give Pyrex some function definitions for it to understand.
27
# All of these are just hints to Pyrex, so that it can try to convert python
28
# objects into similar C objects. (such as PyInt => int).
29
# In anything defined 'cdef extern from XXX' the real C header will be
30
# imported, and the real definition will be used from there. So these are just
31
# hints, and do not need to match exactly to the C definitions.
32
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
33
cdef extern from *:
2474.1.72 by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments
34
    ctypedef unsigned long size_t
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
35
2668.1.1 by John Arbash Meinel
(Lukáš Lalinský) Add a special header for intptr_t for MSVC which doesn't have it in the standard place
36
cdef extern from "_dirstate_helpers_c.h":
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
37
    ctypedef int intptr_t
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
38
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
39
40
cdef extern from "stdlib.h":
41
    unsigned long int strtoul(char *nptr, char **endptr, int base)
42
43
2474.1.72 by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments
44
# These functions allow us access to a bit of the 'bare metal' of python
45
# objects, rather than going through the object abstraction. (For example,
46
# PyList_Append, rather than getting the 'append' attribute of the object, and
47
# creating a tuple, and then using PyCallObject).
48
# Functions that return (or take) a void* are meant to grab a C PyObject*. This
49
# differs from the Pyrex 'object'. If you declare a variable as 'object' Pyrex
50
# will automatically Py_INCREF and Py_DECREF when appropriate. But for some
51
# inner loops, we don't need to do that at all, as the reference only lasts for
52
# a very short time.
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
53
cdef extern from "Python.h":
3640.2.6 by John Arbash Meinel
PQM's pyrex needs a Py_ssize_t typedef.
54
    ctypedef int Py_ssize_t
2474.1.23 by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree
55
    int PyList_Append(object lst, object item) except -1
2474.1.12 by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables.
56
    void *PyList_GetItem_object_void "PyList_GET_ITEM" (object lst, int index)
2474.1.10 by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again.
57
    int PyList_CheckExact(object)
2474.1.23 by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree
58
59
    void *PyTuple_GetItem_void_void "PyTuple_GET_ITEM" (void* tpl, int index)
2474.1.10 by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again.
60
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
61
    char *PyString_AsString(object p)
2474.1.16 by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions.
62
    char *PyString_AS_STRING_void "PyString_AS_STRING" (void *p)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
63
    object PyString_FromString(char *)
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
64
    object PyString_FromStringAndSize(char *, Py_ssize_t)
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
65
    int PyString_Size(object p)
2474.1.16 by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions.
66
    int PyString_GET_SIZE_void "PyString_GET_SIZE" (void *p)
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
67
    int PyString_CheckExact(object p)
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
68
2474.1.23 by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree
69
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
70
cdef extern from "string.h":
2474.1.25 by John Arbash Meinel
Refactor into a helper function to make implementation clearer
71
    int strncmp(char *s1, char *s2, int len)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
72
    void *memchr(void *s, int c, size_t len)
73
    int memcmp(void *b1, void *b2, size_t len)
74
    # ??? memrchr is a GNU extension :(
75
    # void *memrchr(void *s, int c, size_t len)
76
77
78
cdef void* _my_memrchr(void *s, int c, size_t n):
79
    # memrchr seems to be a GNU extension, so we have to implement it ourselves
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
80
    cdef char *pos
81
    cdef char *start
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
82
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
83
    start = <char*>s
84
    pos = start + n - 1
85
    while pos >= start:
86
        if pos[0] == c:
87
            return <void*>pos
88
        pos = pos - 1
89
    return NULL
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
90
91
92
def _py_memrchr(s, c):
93
    """Just to expose _my_memrchr for testing.
94
95
    :param s: The Python string to search
96
    :param c: The character to search for
97
    :return: The offset to the last instance of 'c' in s
98
    """
99
    cdef void *_s
100
    cdef void *found
101
    cdef int length
102
    cdef char *_c
103
104
    _s = PyString_AsString(s)
105
    length = PyString_Size(s)
106
107
    _c = PyString_AsString(c)
108
    assert PyString_Size(c) == 1,\
109
        'Must be a single character string, not %s' % (c,)
110
    found = _my_memrchr(_s, _c[0], length)
111
    if found == NULL:
112
        return None
2668.1.1 by John Arbash Meinel
(Lukáš Lalinský) Add a special header for intptr_t for MSVC which doesn't have it in the standard place
113
    return <char*>found - <char*>_s
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
114
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
115
cdef object safe_string_from_size(char *s, Py_ssize_t size):
116
    if size < 0:
117
        raise AssertionError(
118
            'tried to create a string with an invalid size: %d @0x%x'
119
            % (size, <int>s))
120
    return PyString_FromStringAndSize(s, size)
121
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
122
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
123
cdef int _is_aligned(void *ptr):
124
    """Is this pointer aligned to an integer size offset?
125
126
    :return: 1 if this pointer is aligned, 0 otherwise.
127
    """
128
    return ((<intptr_t>ptr) & ((sizeof(int))-1)) == 0
129
130
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
131
cdef int _cmp_by_dirs(char *path1, int size1, char *path2, int size2):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
132
    cdef unsigned char *cur1
133
    cdef unsigned char *cur2
134
    cdef unsigned char *end1
135
    cdef unsigned char *end2
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
136
    cdef int *cur_int1
137
    cdef int *cur_int2
138
    cdef int *end_int1
139
    cdef int *end_int2
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
140
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
141
    if path1 == path2 and size1 == size2:
2474.1.54 by John Arbash Meinel
Optimize the simple case that the strings are the same object.
142
        return 0
143
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
144
    end1 = <unsigned char*>path1+size1
145
    end2 = <unsigned char*>path2+size2
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
146
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
147
    # Use 32-bit comparisons for the matching portion of the string.
148
    # Almost all CPU's are faster at loading and comparing 32-bit integers,
149
    # than they are at 8-bit integers.
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
150
    # 99% of the time, these will be aligned, but in case they aren't just skip
151
    # this loop
152
    if _is_aligned(path1) and _is_aligned(path2):
153
        cur_int1 = <int*>path1
154
        cur_int2 = <int*>path2
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
155
        end_int1 = <int*>(path1 + size1 - (size1 % sizeof(int)))
156
        end_int2 = <int*>(path2 + size2 - (size2 % sizeof(int)))
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
157
158
        while cur_int1 < end_int1 and cur_int2 < end_int2:
159
            if cur_int1[0] != cur_int2[0]:
160
                break
161
            cur_int1 = cur_int1 + 1
162
            cur_int2 = cur_int2 + 1
163
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
164
        cur1 = <unsigned char*>cur_int1
165
        cur2 = <unsigned char*>cur_int2
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
166
    else:
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
167
        cur1 = <unsigned char*>path1
168
        cur2 = <unsigned char*>path2
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
169
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
170
    while cur1 < end1 and cur2 < end2:
171
        if cur1[0] == cur2[0]:
172
            # This character matches, just go to the next one
173
            cur1 = cur1 + 1
174
            cur2 = cur2 + 1
175
            continue
176
        # The current characters do not match
177
        if cur1[0] == c'/':
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
178
            return -1 # Reached the end of path1 segment first
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
179
        elif cur2[0] == c'/':
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
180
            return 1 # Reached the end of path2 segment first
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
181
        elif cur1[0] < cur2[0]:
182
            return -1
183
        else:
184
            return 1
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
185
186
    # We reached the end of at least one of the strings
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
187
    if cur1 < end1:
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
188
        return 1 # Not at the end of cur1, must be at the end of cur2
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
189
    if cur2 < end2:
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
190
        return -1 # At the end of cur1, but not at cur2
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
191
    # We reached the end of both strings
192
    return 0
193
194
2474.1.47 by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py
195
def cmp_by_dirs_c(path1, path2):
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
196
    """Compare two paths directory by directory.
197
198
    This is equivalent to doing::
199
200
       cmp(path1.split('/'), path2.split('/'))
201
202
    The idea is that you should compare path components separately. This
203
    differs from plain ``cmp(path1, path2)`` for paths like ``'a-b'`` and
204
    ``a/b``. "a-b" comes after "a" but would come before "a/b" lexically.
205
206
    :param path1: first path
207
    :param path2: second path
2872.4.10 by Martin Pool
docstrings for cmp_ functions seem to be backwards
208
    :return: negative number if ``path1`` comes first,
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
209
        0 if paths are equal,
2872.4.10 by Martin Pool
docstrings for cmp_ functions seem to be backwards
210
        and positive number if ``path2`` sorts first
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
211
    """
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
212
    if not PyString_CheckExact(path1):
213
        raise TypeError("'path1' must be a plain string, not %s: %r"
214
                        % (type(path1), path1))
215
    if not PyString_CheckExact(path2):
216
        raise TypeError("'path2' must be a plain string, not %s: %r"
217
                        % (type(path2), path2))
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
218
    return _cmp_by_dirs(PyString_AsString(path1),
219
                        PyString_Size(path1),
220
                        PyString_AsString(path2),
221
                        PyString_Size(path2))
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
222
223
2474.1.66 by John Arbash Meinel
Some restructuring.
224
def _cmp_path_by_dirblock_c(path1, path2):
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
225
    """Compare two paths based on what directory they are in.
226
227
    This generates a sort order, such that all children of a directory are
228
    sorted together, and grandchildren are in the same order as the
229
    children appear. But all grandchildren come after all children.
230
2474.1.66 by John Arbash Meinel
Some restructuring.
231
    In other words, all entries in a directory are sorted together, and
232
    directorys are sorted in cmp_by_dirs order.
233
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
234
    :param path1: first path
235
    :param path2: the second path
2872.4.10 by Martin Pool
docstrings for cmp_ functions seem to be backwards
236
    :return: negative number if ``path1`` comes first,
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
237
        0 if paths are equal
2872.4.10 by Martin Pool
docstrings for cmp_ functions seem to be backwards
238
        and a positive number if ``path2`` sorts first
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
239
    """
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
240
    if not PyString_CheckExact(path1):
241
        raise TypeError("'path1' must be a plain string, not %s: %r"
242
                        % (type(path1), path1))
243
    if not PyString_CheckExact(path2):
244
        raise TypeError("'path2' must be a plain string, not %s: %r"
245
                        % (type(path2), path2))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
246
    return _cmp_path_by_dirblock(PyString_AsString(path1),
247
                                 PyString_Size(path1),
248
                                 PyString_AsString(path2),
249
                                 PyString_Size(path2))
250
251
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
252
cdef int _cmp_path_by_dirblock(char *path1, int path1_len,
253
                               char *path2, int path2_len):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
254
    """Compare two paths by what directory they are in.
255
256
    see ``_cmp_path_by_dirblock_c`` for details.
257
    """
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
258
    cdef char *dirname1
259
    cdef int dirname1_len
260
    cdef char *dirname2
261
    cdef int dirname2_len
262
    cdef char *basename1
263
    cdef int basename1_len
264
    cdef char *basename2
265
    cdef int basename2_len
266
    cdef int cur_len
267
    cdef int cmp_val
268
269
    if path1_len == 0 and path2_len == 0:
270
        return 0
271
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
272
    if path1 == path2 and path1_len == path2_len:
273
        return 0
274
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
275
    if path1_len == 0:
276
        return -1
277
278
    if path2_len == 0:
279
        return 1
280
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
281
    basename1 = <char*>_my_memrchr(path1, c'/', path1_len)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
282
283
    if basename1 == NULL:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
284
        basename1 = path1
285
        basename1_len = path1_len
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
286
        dirname1 = ''
287
        dirname1_len = 0
288
    else:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
289
        dirname1 = path1
290
        dirname1_len = basename1 - path1
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
291
        basename1 = basename1 + 1
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
292
        basename1_len = path1_len - dirname1_len - 1
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
293
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
294
    basename2 = <char*>_my_memrchr(path2, c'/', path2_len)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
295
296
    if basename2 == NULL:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
297
        basename2 = path2
298
        basename2_len = path2_len
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
299
        dirname2 = ''
300
        dirname2_len = 0
301
    else:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
302
        dirname2 = path2
303
        dirname2_len = basename2 - path2
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
304
        basename2 = basename2 + 1
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
305
        basename2_len = path2_len - dirname2_len - 1
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
306
307
    cmp_val = _cmp_by_dirs(dirname1, dirname1_len,
308
                           dirname2, dirname2_len)
309
    if cmp_val != 0:
310
        return cmp_val
311
312
    cur_len = basename1_len
313
    if basename2_len < basename1_len:
314
        cur_len = basename2_len
315
316
    cmp_val = memcmp(basename1, basename2, cur_len)
317
    if cmp_val != 0:
318
        return cmp_val
319
    if basename1_len == basename2_len:
320
        return 0
321
    if basename1_len < basename2_len:
322
        return -1
323
    return 1
324
325
2474.1.66 by John Arbash Meinel
Some restructuring.
326
def _bisect_path_left_c(paths, path):
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
327
    """Return the index where to insert path into paths.
328
329
    This uses a path-wise comparison so we get::
330
        a
331
        a-b
332
        a=b
333
        a/b
334
    Rather than::
335
        a
336
        a-b
337
        a/b
338
        a=b
339
    :param paths: A list of paths to search through
340
    :param path: A single path to insert
341
    :return: An offset where 'path' can be inserted.
342
    :seealso: bisect.bisect_left
343
    """
344
    cdef int _lo
345
    cdef int _hi
346
    cdef int _mid
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
347
    cdef char *path_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
348
    cdef int path_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
349
    cdef char *cur_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
350
    cdef int cur_size
351
    cdef void *cur
352
353
    if not PyList_CheckExact(paths):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
354
        raise TypeError("you must pass a python list for 'paths' not: %s %r"
355
                        % (type(paths), paths))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
356
    if not PyString_CheckExact(path):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
357
        raise TypeError("you must pass a string for 'path' not: %s %r"
358
                        % (type(path), path))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
359
360
    _hi = len(paths)
361
    _lo = 0
362
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
363
    path_cstr = PyString_AsString(path)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
364
    path_size = PyString_Size(path)
365
366
    while _lo < _hi:
367
        _mid = (_lo + _hi) / 2
368
        cur = PyList_GetItem_object_void(paths, _mid)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
369
        cur_cstr = PyString_AS_STRING_void(cur)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
370
        cur_size = PyString_GET_SIZE_void(cur)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
371
        if _cmp_path_by_dirblock(cur_cstr, cur_size, path_cstr, path_size) < 0:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
372
            _lo = _mid + 1
373
        else:
374
            _hi = _mid
375
    return _lo
376
377
2474.1.66 by John Arbash Meinel
Some restructuring.
378
def _bisect_path_right_c(paths, path):
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
379
    """Return the index where to insert path into paths.
380
381
    This uses a path-wise comparison so we get::
382
        a
383
        a-b
384
        a=b
385
        a/b
386
    Rather than::
387
        a
388
        a-b
389
        a/b
390
        a=b
391
    :param paths: A list of paths to search through
392
    :param path: A single path to insert
393
    :return: An offset where 'path' can be inserted.
394
    :seealso: bisect.bisect_right
395
    """
396
    cdef int _lo
397
    cdef int _hi
398
    cdef int _mid
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
399
    cdef char *path_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
400
    cdef int path_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
401
    cdef char *cur_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
402
    cdef int cur_size
403
    cdef void *cur
404
405
    if not PyList_CheckExact(paths):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
406
        raise TypeError("you must pass a python list for 'paths' not: %s %r"
407
                        % (type(paths), paths))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
408
    if not PyString_CheckExact(path):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
409
        raise TypeError("you must pass a string for 'path' not: %s %r"
410
                        % (type(path), path))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
411
412
    _hi = len(paths)
413
    _lo = 0
414
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
415
    path_cstr = PyString_AsString(path)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
416
    path_size = PyString_Size(path)
417
418
    while _lo < _hi:
419
        _mid = (_lo + _hi) / 2
420
        cur = PyList_GetItem_object_void(paths, _mid)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
421
        cur_cstr = PyString_AS_STRING_void(cur)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
422
        cur_size = PyString_GET_SIZE_void(cur)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
423
        if _cmp_path_by_dirblock(path_cstr, path_size, cur_cstr, cur_size) < 0:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
424
            _hi = _mid
425
        else:
426
            _lo = _mid + 1
427
    return _lo
428
429
2474.1.47 by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py
430
def bisect_dirblock_c(dirblocks, dirname, lo=0, hi=None, cache=None):
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
431
    """Return the index where to insert dirname into the dirblocks.
432
433
    The return value idx is such that all directories blocks in dirblock[:idx]
434
    have names < dirname, and all blocks in dirblock[idx:] have names >=
435
    dirname.
436
437
    Optional args lo (default 0) and hi (default len(dirblocks)) bound the
438
    slice of a to be searched.
439
    """
440
    cdef int _lo
441
    cdef int _hi
442
    cdef int _mid
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
443
    cdef char *dirname_cstr
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
444
    cdef int dirname_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
445
    cdef char *cur_cstr
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
446
    cdef int cur_size
2474.1.12 by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables.
447
    cdef void *cur
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
448
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
449
    if not PyList_CheckExact(dirblocks):
450
        raise TypeError("you must pass a python list for 'dirblocks' not: %s %r"
451
                        % (type(dirblocks), dirblocks))
452
    if not PyString_CheckExact(dirname):
453
        raise TypeError("you must pass a string for dirname not: %s %r"
454
                        % (type(dirname), dirname))
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
455
    if hi is None:
456
        _hi = len(dirblocks)
457
    else:
458
        _hi = hi
459
460
    _lo = lo
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
461
    dirname_cstr = PyString_AsString(dirname)
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
462
    dirname_size = PyString_Size(dirname)
463
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
464
    while _lo < _hi:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
465
        _mid = (_lo + _hi) / 2
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
466
        # Grab the dirname for the current dirblock
2474.1.12 by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables.
467
        # cur = dirblocks[_mid][0]
468
        cur = PyTuple_GetItem_void_void(
469
                PyList_GetItem_object_void(dirblocks, _mid), 0)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
470
        cur_cstr = PyString_AS_STRING_void(cur)
2474.1.16 by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions.
471
        cur_size = PyString_GET_SIZE_void(cur)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
472
        if _cmp_by_dirs(cur_cstr, cur_size, dirname_cstr, dirname_size) < 0:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
473
            _lo = _mid + 1
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
474
        else:
475
            _hi = _mid
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
476
    return _lo
477
478
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
479
cdef class Reader:
480
    """Maintain the current location, and return fields as you parse them."""
481
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
482
    cdef object state # The DirState object
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
483
    cdef object text # The overall string object
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
484
    cdef char *text_cstr # Pointer to the beginning of text
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
485
    cdef int text_size # Length of text
486
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
487
    cdef char *end_cstr # End of text
488
    cdef char *cur_cstr # Pointer to the current record
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
489
    cdef char *next # Pointer to the end of this record
490
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
491
    def __init__(self, text, state):
492
        self.state = state
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
493
        self.text = text
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
494
        self.text_cstr = PyString_AsString(text)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
495
        self.text_size = PyString_Size(text)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
496
        self.end_cstr = self.text_cstr + self.text_size
497
        self.cur_cstr = self.text_cstr
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
498
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
499
    cdef char *get_next(self, int *size) except NULL:
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
500
        """Return a pointer to the start of the next field."""
501
        cdef char *next
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
502
        cdef Py_ssize_t extra_len
503
504
        if self.cur_cstr == NULL:
505
            raise AssertionError('get_next() called when cur_str is NULL')
506
        elif self.cur_cstr >= self.end_cstr:
507
            raise AssertionError('get_next() called when there are no chars'
508
                                 ' left')
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
509
        next = self.cur_cstr
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
510
        self.cur_cstr = <char*>memchr(next, c'\0', self.end_cstr - next)
511
        if self.cur_cstr == NULL:
512
            extra_len = self.end_cstr - next
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
513
            raise errors.DirstateCorrupt(self.state,
514
                'failed to find trailing NULL (\\0).'
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
515
                ' Trailing garbage: %r'
516
                % safe_string_from_size(next, extra_len))
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
517
        size[0] = self.cur_cstr - next
518
        self.cur_cstr = self.cur_cstr + 1
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
519
        return next
520
2474.1.53 by John Arbash Meinel
Changing Reader.get_next_str (which returns a Python String)
521
    cdef object get_next_str(self):
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
522
        """Get the next field as a Python string."""
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
523
        cdef int size
524
        cdef char *next
525
        next = self.get_next(&size)
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
526
        return safe_string_from_size(next, size)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
527
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
528
    cdef int _init(self) except -1:
529
        """Get the pointer ready.
530
531
        This assumes that the dirstate header has already been read, and we
532
        already have the dirblock string loaded into memory.
533
        This just initializes our memory pointers, etc for parsing of the
534
        dirblock string.
535
        """
2474.1.32 by John Arbash Meinel
Skip past the first entry while reading,
536
        cdef char *first
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
537
        cdef int size
2474.1.32 by John Arbash Meinel
Skip past the first entry while reading,
538
        # The first field should be an empty string left over from the Header
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
539
        first = self.get_next(&size)
540
        if first[0] != c'\0' and size == 0:
2474.1.32 by John Arbash Meinel
Skip past the first entry while reading,
541
            raise AssertionError('First character should be null not: %s'
542
                                 % (first,))
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
543
        return 0
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
544
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
545
    cdef object _get_entry(self, int num_trees, void **p_current_dirname,
546
                           int *new_block):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
547
        """Extract the next entry.
548
549
        This parses the next entry based on the current location in
550
        ``self.cur_cstr``.
551
        Each entry can be considered a "row" in the total table. And each row
552
        has a fixed number of columns. It is generally broken up into "key"
553
        columns, then "current" columns, and then "parent" columns.
554
555
        :param num_trees: How many parent trees need to be parsed
556
        :param p_current_dirname: A pointer to the current PyString
557
            representing the directory name.
558
            We pass this in as a void * so that pyrex doesn't have to
559
            increment/decrement the PyObject reference counter for each
560
            _get_entry call.
561
            We use a pointer so that _get_entry can update it with the new
562
            value.
563
        :param new_block: This is to let the caller know that it needs to
564
            create a new directory block to store the next entry.
565
        """
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
566
        cdef object path_name_file_id_key
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
567
        cdef char *entry_size_cstr
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
568
        cdef unsigned long int entry_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
569
        cdef char* executable_cstr
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
570
        cdef int is_executable
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
571
        cdef char* dirname_cstr
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
572
        cdef char* trailing
573
        cdef int cur_size
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
574
        cdef int i
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
575
        cdef object minikind
576
        cdef object fingerprint
577
        cdef object info
578
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
579
        # Read the 'key' information (dirname, name, file_id)
580
        dirname_cstr = self.get_next(&cur_size)
581
        # Check to see if we have started a new directory block.
582
        # If so, then we need to create a new dirname PyString, so that it can
583
        # be used in all of the tuples. This saves time and memory, by re-using
584
        # the same object repeatedly.
585
586
        # Do the cheap 'length of string' check first. If the string is a
587
        # different length, then we *have* to be a different directory.
588
        if (cur_size != PyString_GET_SIZE_void(p_current_dirname[0])
589
            or strncmp(dirname_cstr,
590
                       # Extract the char* from our current dirname string.  We
591
                       # know it is a PyString, so we can use
592
                       # PyString_AS_STRING, we use the _void version because
593
                       # we are tricking Pyrex by using a void* rather than an
594
                       # <object>
595
                       PyString_AS_STRING_void(p_current_dirname[0]),
596
                       cur_size+1) != 0):
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
597
            dirname = safe_string_from_size(dirname_cstr, cur_size)
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
598
            p_current_dirname[0] = <void*>dirname
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
599
            new_block[0] = 1
600
        else:
601
            new_block[0] = 0
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
602
603
        # Build up the key that will be used.
604
        # By using <object>(void *) Pyrex will automatically handle the
605
        # Py_INCREF that we need.
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
606
        path_name_file_id_key = (<object>p_current_dirname[0],
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
607
                                 self.get_next_str(),
608
                                 self.get_next_str(),
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
609
                                )
610
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
611
        # Parse all of the per-tree information. current has the information in
612
        # the same location as parent trees. The only difference is that 'info'
613
        # is a 'packed_stat' for current, while it is a 'revision_id' for
614
        # parent trees.
615
        # minikind, fingerprint, and info will be returned as regular python
616
        # strings
617
        # entry_size and is_executable will be parsed into a python Long and
618
        # python Boolean, respectively.
619
        # TODO: jam 20070718 Consider changin the entry_size conversion to
620
        #       prefer python Int when possible. They are generally faster to
621
        #       work with, and it will be rare that we have a file >2GB.
622
        #       Especially since this code is pretty much fixed at a max of
623
        #       4GB.
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
624
        trees = []
625
        for i from 0 <= i < num_trees:
626
            minikind = self.get_next_str()
627
            fingerprint = self.get_next_str()
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
628
            entry_size_cstr = self.get_next(&cur_size)
629
            entry_size = strtoul(entry_size_cstr, NULL, 10)
630
            executable_cstr = self.get_next(&cur_size)
631
            is_executable = (executable_cstr[0] == c'y')
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
632
            info = self.get_next_str()
633
            PyList_Append(trees, (
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
634
                minikind,     # minikind
635
                fingerprint,  # fingerprint
636
                entry_size,   # size
637
                is_executable,# executable
638
                info,         # packed_stat or revision_id
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
639
            ))
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
640
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
641
        # The returned tuple is (key, [trees])
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
642
        ret = (path_name_file_id_key, trees)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
643
        # Ignore the trailing newline, but assert that it does exist, this
644
        # ensures that we always finish parsing a line on an end-of-entry
645
        # marker.
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
646
        trailing = self.get_next(&cur_size)
647
        if cur_size != 1 or trailing[0] != c'\n':
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
648
            raise errors.DirstateCorrupt(self.state,
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
649
                'Bad parse, we expected to end on \\n, not: %d %s: %s'
3640.2.1 by John Arbash Meinel
More safety checks around PyString_FromStringAndSize,
650
                % (cur_size, safe_string_from_size(trailing, cur_size),
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
651
                   ret))
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
652
        return ret
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
653
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
654
    def _parse_dirblocks(self):
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
655
        """Parse all dirblocks in the state file."""
656
        cdef int num_trees
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
657
        cdef object current_block
658
        cdef object entry
659
        cdef void * current_dirname
660
        cdef int new_block
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
661
        cdef int expected_entry_count
662
        cdef int entry_count
663
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
664
        num_trees = self.state._num_present_parents() + 1
665
        expected_entry_count = self.state._num_entries
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
666
667
        # Ignore the first record
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
668
        self._init()
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
669
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
670
        current_block = []
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
671
        dirblocks = [('', current_block), ('', [])]
672
        self.state._dirblocks = dirblocks
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
673
        obj = ''
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
674
        current_dirname = <void*>obj
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
675
        new_block = 0
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
676
        entry_count = 0
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
677
2474.1.54 by John Arbash Meinel
Optimize the simple case that the strings are the same object.
678
        # TODO: jam 2007-05-07 Consider pre-allocating some space for the
679
        #       members, and then growing and shrinking from there. If most
680
        #       directories have close to 10 entries in them, it would save a
681
        #       few mallocs if we default our list size to something
682
        #       reasonable. Or we could malloc it to something large (100 or
683
        #       so), and then truncate. That would give us a malloc + realloc,
684
        #       rather than lots of reallocs.
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
685
        while self.cur_cstr < self.end_cstr:
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
686
            entry = self._get_entry(num_trees, &current_dirname, &new_block)
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
687
            if new_block:
688
                # new block - different dirname
689
                current_block = []
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
690
                PyList_Append(dirblocks,
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
691
                              (<object>current_dirname, current_block))
692
            PyList_Append(current_block, entry)
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
693
            entry_count = entry_count + 1
694
        if entry_count != expected_entry_count:
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
695
            raise errors.DirstateCorrupt(self.state,
696
                    'We read the wrong number of entries.'
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
697
                    ' We expected to read %s, but read %s'
698
                    % (expected_entry_count, entry_count))
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
699
        self.state._split_root_dirblock_into_contents()
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
700
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
701
2474.1.47 by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py
702
def _read_dirblocks_c(state):
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
703
    """Read in the dirblocks for the given DirState object.
704
705
    This is tightly bound to the DirState internal representation. It should be
706
    thought of as a member function, which is only separated out so that we can
707
    re-write it in pyrex.
708
709
    :param state: A DirState object.
710
    :return: None
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
711
    :postcondition: The dirblocks will be loaded into the appropriate fields in
712
        the DirState object.
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
713
    """
714
    state._state_file.seek(state._end_of_header)
715
    text = state._state_file.read()
716
    # TODO: check the crc checksums. crc_measured = zlib.crc32(text)
717
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
718
    reader = Reader(text, state)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
719
3640.2.5 by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places
720
    reader._parse_dirblocks()
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
721
    state._dirblock_state = DirState.IN_MEMORY_UNMODIFIED