~bzr-pqm/bzr/bzr.dev

2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
1
# Copyright (C) 2007 Canonical Ltd
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
17
"""Helper functions for DirState.
18
19
This is the python implementation for DirState functions.
20
"""
21
22
from bzrlib.dirstate import DirState
23
24
2474.1.72 by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments
25
# Give Pyrex some function definitions for it to understand.
26
# All of these are just hints to Pyrex, so that it can try to convert python
27
# objects into similar C objects. (such as PyInt => int).
28
# In anything defined 'cdef extern from XXX' the real C header will be
29
# imported, and the real definition will be used from there. So these are just
30
# hints, and do not need to match exactly to the C definitions.
31
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
32
cdef extern from *:
2474.1.72 by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments
33
    ctypedef unsigned long size_t
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
34
35
cdef extern from "stdint.h":
36
    ctypedef int intptr_t
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
37
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
38
39
cdef extern from "stdlib.h":
40
    unsigned long int strtoul(char *nptr, char **endptr, int base)
41
42
2474.1.72 by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments
43
# These functions allow us access to a bit of the 'bare metal' of python
44
# objects, rather than going through the object abstraction. (For example,
45
# PyList_Append, rather than getting the 'append' attribute of the object, and
46
# creating a tuple, and then using PyCallObject).
47
# Functions that return (or take) a void* are meant to grab a C PyObject*. This
48
# differs from the Pyrex 'object'. If you declare a variable as 'object' Pyrex
49
# will automatically Py_INCREF and Py_DECREF when appropriate. But for some
50
# inner loops, we don't need to do that at all, as the reference only lasts for
51
# a very short time.
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
52
cdef extern from "Python.h":
2474.1.23 by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree
53
    int PyList_Append(object lst, object item) except -1
2474.1.12 by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables.
54
    void *PyList_GetItem_object_void "PyList_GET_ITEM" (object lst, int index)
2474.1.10 by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again.
55
    int PyList_CheckExact(object)
2474.1.23 by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree
56
57
    void *PyTuple_GetItem_void_void "PyTuple_GET_ITEM" (void* tpl, int index)
2474.1.10 by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again.
58
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
59
    char *PyString_AsString(object p)
2474.1.16 by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions.
60
    char *PyString_AS_STRING_void "PyString_AS_STRING" (void *p)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
61
    object PyString_FromString(char *)
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
62
    object PyString_FromStringAndSize(char *, int)
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
63
    int PyString_Size(object p)
2474.1.16 by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions.
64
    int PyString_GET_SIZE_void "PyString_GET_SIZE" (void *p)
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
65
    int PyString_CheckExact(object p)
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
66
2474.1.23 by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree
67
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
68
cdef extern from "string.h":
2474.1.25 by John Arbash Meinel
Refactor into a helper function to make implementation clearer
69
    int strncmp(char *s1, char *s2, int len)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
70
    void *memchr(void *s, int c, size_t len)
71
    int memcmp(void *b1, void *b2, size_t len)
72
    # ??? memrchr is a GNU extension :(
73
    # void *memrchr(void *s, int c, size_t len)
74
75
76
cdef void* _my_memrchr(void *s, int c, size_t n):
77
    # memrchr seems to be a GNU extension, so we have to implement it ourselves
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
78
    cdef char *pos
79
    cdef char *start
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
80
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
81
    start = <char*>s
82
    pos = start + n - 1
83
    while pos >= start:
84
        if pos[0] == c:
85
            return <void*>pos
86
        pos = pos - 1
87
    return NULL
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
88
89
90
def _py_memrchr(s, c):
91
    """Just to expose _my_memrchr for testing.
92
93
    :param s: The Python string to search
94
    :param c: The character to search for
95
    :return: The offset to the last instance of 'c' in s
96
    """
97
    cdef void *_s
98
    cdef void *found
99
    cdef int length
100
    cdef char *_c
101
102
    _s = PyString_AsString(s)
103
    length = PyString_Size(s)
104
105
    _c = PyString_AsString(c)
106
    assert PyString_Size(c) == 1,\
107
        'Must be a single character string, not %s' % (c,)
108
    found = _my_memrchr(_s, _c[0], length)
109
    if found == NULL:
110
        return None
111
    return found - _s
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
112
113
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
114
cdef int _is_aligned(void *ptr):
115
    """Is this pointer aligned to an integer size offset?
116
117
    :return: 1 if this pointer is aligned, 0 otherwise.
118
    """
119
    return ((<intptr_t>ptr) & ((sizeof(int))-1)) == 0
120
121
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
122
cdef int _cmp_by_dirs(char *path1, int size1, char *path2, int size2):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
123
    cdef unsigned char *cur1
124
    cdef unsigned char *cur2
125
    cdef unsigned char *end1
126
    cdef unsigned char *end2
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
127
    cdef int *cur_int1
128
    cdef int *cur_int2
129
    cdef int *end_int1
130
    cdef int *end_int2
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
131
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
132
    if path1 == path2 and size1 == size2:
2474.1.54 by John Arbash Meinel
Optimize the simple case that the strings are the same object.
133
        return 0
134
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
135
    end1 = <unsigned char*>path1+size1
136
    end2 = <unsigned char*>path2+size2
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
137
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
138
    # Use 32-bit comparisons for the matching portion of the string.
139
    # Almost all CPU's are faster at loading and comparing 32-bit integers,
140
    # than they are at 8-bit integers.
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
141
    # 99% of the time, these will be aligned, but in case they aren't just skip
142
    # this loop
143
    if _is_aligned(path1) and _is_aligned(path2):
144
        cur_int1 = <int*>path1
145
        cur_int2 = <int*>path2
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
146
        end_int1 = <int*>(path1 + size1 - (size1 % sizeof(int)))
147
        end_int2 = <int*>(path2 + size2 - (size2 % sizeof(int)))
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
148
149
        while cur_int1 < end_int1 and cur_int2 < end_int2:
150
            if cur_int1[0] != cur_int2[0]:
151
                break
152
            cur_int1 = cur_int1 + 1
153
            cur_int2 = cur_int2 + 1
154
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
155
        cur1 = <unsigned char*>cur_int1
156
        cur2 = <unsigned char*>cur_int2
2474.1.69 by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups
157
    else:
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
158
        cur1 = <unsigned char*>path1
159
        cur2 = <unsigned char*>path2
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
160
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
161
    while cur1 < end1 and cur2 < end2:
162
        if cur1[0] == cur2[0]:
163
            # This character matches, just go to the next one
164
            cur1 = cur1 + 1
165
            cur2 = cur2 + 1
166
            continue
167
        # The current characters do not match
168
        if cur1[0] == c'/':
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
169
            return -1 # Reached the end of path1 segment first
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
170
        elif cur2[0] == c'/':
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
171
            return 1 # Reached the end of path2 segment first
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
172
        elif cur1[0] < cur2[0]:
173
            return -1
174
        else:
175
            return 1
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
176
177
    # We reached the end of at least one of the strings
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
178
    if cur1 < end1:
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
179
        return 1 # Not at the end of cur1, must be at the end of cur2
2474.1.18 by John Arbash Meinel
Add an integer-size comparison loop at the begining, and
180
    if cur2 < end2:
2474.1.19 by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default.
181
        return -1 # At the end of cur1, but not at cur2
2474.1.17 by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably
182
    # We reached the end of both strings
183
    return 0
184
185
2474.1.47 by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py
186
def cmp_by_dirs_c(path1, path2):
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
187
    """Compare two paths directory by directory.
188
189
    This is equivalent to doing::
190
191
       cmp(path1.split('/'), path2.split('/'))
192
193
    The idea is that you should compare path components separately. This
194
    differs from plain ``cmp(path1, path2)`` for paths like ``'a-b'`` and
195
    ``a/b``. "a-b" comes after "a" but would come before "a/b" lexically.
196
197
    :param path1: first path
198
    :param path2: second path
199
    :return: positive number if ``path1`` comes first,
200
        0 if paths are equal,
201
        and negative number if ``path2`` sorts first
202
    """
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
203
    if not PyString_CheckExact(path1):
204
        raise TypeError("'path1' must be a plain string, not %s: %r"
205
                        % (type(path1), path1))
206
    if not PyString_CheckExact(path2):
207
        raise TypeError("'path2' must be a plain string, not %s: %r"
208
                        % (type(path2), path2))
2474.1.41 by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs
209
    return _cmp_by_dirs(PyString_AsString(path1),
210
                        PyString_Size(path1),
211
                        PyString_AsString(path2),
212
                        PyString_Size(path2))
2474.1.13 by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
213
214
2474.1.66 by John Arbash Meinel
Some restructuring.
215
def _cmp_path_by_dirblock_c(path1, path2):
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
216
    """Compare two paths based on what directory they are in.
217
218
    This generates a sort order, such that all children of a directory are
219
    sorted together, and grandchildren are in the same order as the
220
    children appear. But all grandchildren come after all children.
221
2474.1.66 by John Arbash Meinel
Some restructuring.
222
    In other words, all entries in a directory are sorted together, and
223
    directorys are sorted in cmp_by_dirs order.
224
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
225
    :param path1: first path
226
    :param path2: the second path
227
    :return: positive number if ``path1`` comes first,
228
        0 if paths are equal
229
        and a negative number if ``path2`` sorts first
230
    """
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
231
    if not PyString_CheckExact(path1):
232
        raise TypeError("'path1' must be a plain string, not %s: %r"
233
                        % (type(path1), path1))
234
    if not PyString_CheckExact(path2):
235
        raise TypeError("'path2' must be a plain string, not %s: %r"
236
                        % (type(path2), path2))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
237
    return _cmp_path_by_dirblock(PyString_AsString(path1),
238
                                 PyString_Size(path1),
239
                                 PyString_AsString(path2),
240
                                 PyString_Size(path2))
241
242
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
243
cdef int _cmp_path_by_dirblock(char *path1, int path1_len,
244
                               char *path2, int path2_len):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
245
    """Compare two paths by what directory they are in.
246
247
    see ``_cmp_path_by_dirblock_c`` for details.
248
    """
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
249
    cdef char *dirname1
250
    cdef int dirname1_len
251
    cdef char *dirname2
252
    cdef int dirname2_len
253
    cdef char *basename1
254
    cdef int basename1_len
255
    cdef char *basename2
256
    cdef int basename2_len
257
    cdef int cur_len
258
    cdef int cmp_val
259
260
    if path1_len == 0 and path2_len == 0:
261
        return 0
262
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
263
    if path1 == path2 and path1_len == path2_len:
264
        return 0
265
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
266
    if path1_len == 0:
267
        return -1
268
269
    if path2_len == 0:
270
        return 1
271
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
272
    basename1 = <char*>_my_memrchr(path1, c'/', path1_len)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
273
274
    if basename1 == NULL:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
275
        basename1 = path1
276
        basename1_len = path1_len
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
277
        dirname1 = ''
278
        dirname1_len = 0
279
    else:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
280
        dirname1 = path1
281
        dirname1_len = basename1 - path1
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
282
        basename1 = basename1 + 1
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
283
        basename1_len = path1_len - dirname1_len - 1
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
284
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
285
    basename2 = <char*>_my_memrchr(path2, c'/', path2_len)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
286
287
    if basename2 == NULL:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
288
        basename2 = path2
289
        basename2_len = path2_len
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
290
        dirname2 = ''
291
        dirname2_len = 0
292
    else:
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
293
        dirname2 = path2
294
        dirname2_len = basename2 - path2
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
295
        basename2 = basename2 + 1
2474.1.59 by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass.
296
        basename2_len = path2_len - dirname2_len - 1
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
297
298
    cmp_val = _cmp_by_dirs(dirname1, dirname1_len,
299
                           dirname2, dirname2_len)
300
    if cmp_val != 0:
301
        return cmp_val
302
303
    cur_len = basename1_len
304
    if basename2_len < basename1_len:
305
        cur_len = basename2_len
306
307
    cmp_val = memcmp(basename1, basename2, cur_len)
308
    if cmp_val != 0:
309
        return cmp_val
310
    if basename1_len == basename2_len:
311
        return 0
312
    if basename1_len < basename2_len:
313
        return -1
314
    return 1
315
316
2474.1.66 by John Arbash Meinel
Some restructuring.
317
def _bisect_path_left_c(paths, path):
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
318
    """Return the index where to insert path into paths.
319
320
    This uses a path-wise comparison so we get::
321
        a
322
        a-b
323
        a=b
324
        a/b
325
    Rather than::
326
        a
327
        a-b
328
        a/b
329
        a=b
330
    :param paths: A list of paths to search through
331
    :param path: A single path to insert
332
    :return: An offset where 'path' can be inserted.
333
    :seealso: bisect.bisect_left
334
    """
335
    cdef int _lo
336
    cdef int _hi
337
    cdef int _mid
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
338
    cdef char *path_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
339
    cdef int path_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
340
    cdef char *cur_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
341
    cdef int cur_size
342
    cdef void *cur
343
344
    if not PyList_CheckExact(paths):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
345
        raise TypeError("you must pass a python list for 'paths' not: %s %r"
346
                        % (type(paths), paths))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
347
    if not PyString_CheckExact(path):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
348
        raise TypeError("you must pass a string for 'path' not: %s %r"
349
                        % (type(path), path))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
350
351
    _hi = len(paths)
352
    _lo = 0
353
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
354
    path_cstr = PyString_AsString(path)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
355
    path_size = PyString_Size(path)
356
357
    while _lo < _hi:
358
        _mid = (_lo + _hi) / 2
359
        cur = PyList_GetItem_object_void(paths, _mid)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
360
        cur_cstr = PyString_AS_STRING_void(cur)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
361
        cur_size = PyString_GET_SIZE_void(cur)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
362
        if _cmp_path_by_dirblock(cur_cstr, cur_size, path_cstr, path_size) < 0:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
363
            _lo = _mid + 1
364
        else:
365
            _hi = _mid
366
    return _lo
367
368
2474.1.66 by John Arbash Meinel
Some restructuring.
369
def _bisect_path_right_c(paths, path):
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
370
    """Return the index where to insert path into paths.
371
372
    This uses a path-wise comparison so we get::
373
        a
374
        a-b
375
        a=b
376
        a/b
377
    Rather than::
378
        a
379
        a-b
380
        a/b
381
        a=b
382
    :param paths: A list of paths to search through
383
    :param path: A single path to insert
384
    :return: An offset where 'path' can be inserted.
385
    :seealso: bisect.bisect_right
386
    """
387
    cdef int _lo
388
    cdef int _hi
389
    cdef int _mid
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
390
    cdef char *path_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
391
    cdef int path_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
392
    cdef char *cur_cstr
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
393
    cdef int cur_size
394
    cdef void *cur
395
396
    if not PyList_CheckExact(paths):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
397
        raise TypeError("you must pass a python list for 'paths' not: %s %r"
398
                        % (type(paths), paths))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
399
    if not PyString_CheckExact(path):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
400
        raise TypeError("you must pass a string for 'path' not: %s %r"
401
                        % (type(path), path))
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
402
403
    _hi = len(paths)
404
    _lo = 0
405
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
406
    path_cstr = PyString_AsString(path)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
407
    path_size = PyString_Size(path)
408
409
    while _lo < _hi:
410
        _mid = (_lo + _hi) / 2
411
        cur = PyList_GetItem_object_void(paths, _mid)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
412
        cur_cstr = PyString_AS_STRING_void(cur)
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
413
        cur_size = PyString_GET_SIZE_void(cur)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
414
        if _cmp_path_by_dirblock(path_cstr, path_size, cur_cstr, cur_size) < 0:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
415
            _hi = _mid
416
        else:
417
            _lo = _mid + 1
418
    return _lo
419
420
2474.1.47 by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py
421
def bisect_dirblock_c(dirblocks, dirname, lo=0, hi=None, cache=None):
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
422
    """Return the index where to insert dirname into the dirblocks.
423
424
    The return value idx is such that all directories blocks in dirblock[:idx]
425
    have names < dirname, and all blocks in dirblock[idx:] have names >=
426
    dirname.
427
428
    Optional args lo (default 0) and hi (default len(dirblocks)) bound the
429
    slice of a to be searched.
430
    """
431
    cdef int _lo
432
    cdef int _hi
433
    cdef int _mid
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
434
    cdef char *dirname_cstr
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
435
    cdef int dirname_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
436
    cdef char *cur_cstr
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
437
    cdef int cur_size
2474.1.12 by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables.
438
    cdef void *cur
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
439
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
440
    if not PyList_CheckExact(dirblocks):
441
        raise TypeError("you must pass a python list for 'dirblocks' not: %s %r"
442
                        % (type(dirblocks), dirblocks))
443
    if not PyString_CheckExact(dirname):
444
        raise TypeError("you must pass a string for dirname not: %s %r"
445
                        % (type(dirname), dirname))
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
446
    if hi is None:
447
        _hi = len(dirblocks)
448
    else:
449
        _hi = hi
450
451
    _lo = lo
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
452
    dirname_cstr = PyString_AsString(dirname)
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
453
    dirname_size = PyString_Size(dirname)
454
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
455
    while _lo < _hi:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
456
        _mid = (_lo + _hi) / 2
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
457
        # Grab the dirname for the current dirblock
2474.1.12 by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables.
458
        # cur = dirblocks[_mid][0]
459
        cur = PyTuple_GetItem_void_void(
460
                PyList_GetItem_object_void(dirblocks, _mid), 0)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
461
        cur_cstr = PyString_AS_STRING_void(cur)
2474.1.16 by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions.
462
        cur_size = PyString_GET_SIZE_void(cur)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
463
        if _cmp_by_dirs(cur_cstr, cur_size, dirname_cstr, dirname_size) < 0:
2474.1.58 by John Arbash Meinel
(broken) Try to properly implement DirState._bisect*
464
            _lo = _mid + 1
2474.1.14 by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/')
465
        else:
466
            _hi = _mid
2474.1.4 by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
467
    return _lo
468
469
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
470
cdef class Reader:
471
    """Maintain the current location, and return fields as you parse them."""
472
473
    cdef object text # The overall string object
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
474
    cdef char *text_cstr # Pointer to the beginning of text
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
475
    cdef int text_size # Length of text
476
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
477
    cdef char *end_cstr # End of text
478
    cdef char *cur_cstr # Pointer to the current record
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
479
    cdef char *next # Pointer to the end of this record
480
481
    def __new__(self, text):
482
        self.text = text
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
483
        self.text_cstr = PyString_AsString(text)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
484
        self.text_size = PyString_Size(text)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
485
        self.end_cstr = self.text_cstr + self.text_size
486
        self.cur_cstr = self.text_cstr
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
487
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
488
    cdef char *get_next(self, int *size):
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
489
        """Return a pointer to the start of the next field."""
490
        cdef char *next
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
491
        next = self.cur_cstr
492
        self.cur_cstr = <char*>memchr(next, c'\0', self.end_cstr-next)
493
        size[0] = self.cur_cstr - next
494
        self.cur_cstr = self.cur_cstr + 1
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
495
        return next
496
2474.1.53 by John Arbash Meinel
Changing Reader.get_next_str (which returns a Python String)
497
    cdef object get_next_str(self):
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
498
        """Get the next field as a Python string."""
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
499
        cdef int size
500
        cdef char *next
501
        next = self.get_next(&size)
502
        return PyString_FromStringAndSize(next, size)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
503
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
504
    cdef int _init(self) except -1:
505
        """Get the pointer ready.
506
507
        This assumes that the dirstate header has already been read, and we
508
        already have the dirblock string loaded into memory.
509
        This just initializes our memory pointers, etc for parsing of the
510
        dirblock string.
511
        """
2474.1.32 by John Arbash Meinel
Skip past the first entry while reading,
512
        cdef char *first
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
513
        cdef int size
2474.1.32 by John Arbash Meinel
Skip past the first entry while reading,
514
        # The first field should be an empty string left over from the Header
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
515
        first = self.get_next(&size)
516
        if first[0] != c'\0' and size == 0:
2474.1.32 by John Arbash Meinel
Skip past the first entry while reading,
517
            raise AssertionError('First character should be null not: %s'
518
                                 % (first,))
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
519
        return 0
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
520
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
521
    cdef object _get_entry(self, int num_trees, void **p_current_dirname,
522
                           int *new_block):
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
523
        """Extract the next entry.
524
525
        This parses the next entry based on the current location in
526
        ``self.cur_cstr``.
527
        Each entry can be considered a "row" in the total table. And each row
528
        has a fixed number of columns. It is generally broken up into "key"
529
        columns, then "current" columns, and then "parent" columns.
530
531
        :param num_trees: How many parent trees need to be parsed
532
        :param p_current_dirname: A pointer to the current PyString
533
            representing the directory name.
534
            We pass this in as a void * so that pyrex doesn't have to
535
            increment/decrement the PyObject reference counter for each
536
            _get_entry call.
537
            We use a pointer so that _get_entry can update it with the new
538
            value.
539
        :param new_block: This is to let the caller know that it needs to
540
            create a new directory block to store the next entry.
541
        """
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
542
        cdef object path_name_file_id_key
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
543
        cdef char *entry_size_cstr
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
544
        cdef unsigned long int entry_size
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
545
        cdef char* executable_cstr
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
546
        cdef int is_executable
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
547
        cdef char* dirname_cstr
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
548
        cdef char* trailing
549
        cdef int cur_size
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
550
        cdef int i
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
551
        cdef object minikind
552
        cdef object fingerprint
553
        cdef object info
554
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
555
        # Read the 'key' information (dirname, name, file_id)
556
        dirname_cstr = self.get_next(&cur_size)
557
        # Check to see if we have started a new directory block.
558
        # If so, then we need to create a new dirname PyString, so that it can
559
        # be used in all of the tuples. This saves time and memory, by re-using
560
        # the same object repeatedly.
561
562
        # Do the cheap 'length of string' check first. If the string is a
563
        # different length, then we *have* to be a different directory.
564
        if (cur_size != PyString_GET_SIZE_void(p_current_dirname[0])
565
            or strncmp(dirname_cstr,
566
                       # Extract the char* from our current dirname string.  We
567
                       # know it is a PyString, so we can use
568
                       # PyString_AS_STRING, we use the _void version because
569
                       # we are tricking Pyrex by using a void* rather than an
570
                       # <object>
571
                       PyString_AS_STRING_void(p_current_dirname[0]),
572
                       cur_size+1) != 0):
573
            dirname = PyString_FromStringAndSize(dirname_cstr, cur_size)
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
574
            p_current_dirname[0] = <void*>dirname
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
575
            new_block[0] = 1
576
        else:
577
            new_block[0] = 0
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
578
579
        # Build up the key that will be used.
580
        # By using <object>(void *) Pyrex will automatically handle the
581
        # Py_INCREF that we need.
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
582
        path_name_file_id_key = (<object>p_current_dirname[0],
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
583
                                 self.get_next_str(),
584
                                 self.get_next_str(),
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
585
                                )
586
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
587
        # Parse all of the per-tree information. current has the information in
588
        # the same location as parent trees. The only difference is that 'info'
589
        # is a 'packed_stat' for current, while it is a 'revision_id' for
590
        # parent trees.
591
        # minikind, fingerprint, and info will be returned as regular python
592
        # strings
593
        # entry_size and is_executable will be parsed into a python Long and
594
        # python Boolean, respectively.
595
        # TODO: jam 20070718 Consider changin the entry_size conversion to
596
        #       prefer python Int when possible. They are generally faster to
597
        #       work with, and it will be rare that we have a file >2GB.
598
        #       Especially since this code is pretty much fixed at a max of
599
        #       4GB.
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
600
        trees = []
601
        for i from 0 <= i < num_trees:
602
            minikind = self.get_next_str()
603
            fingerprint = self.get_next_str()
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
604
            entry_size_cstr = self.get_next(&cur_size)
605
            entry_size = strtoul(entry_size_cstr, NULL, 10)
606
            executable_cstr = self.get_next(&cur_size)
607
            is_executable = (executable_cstr[0] == c'y')
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
608
            info = self.get_next_str()
609
            PyList_Append(trees, (
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
610
                minikind,     # minikind
611
                fingerprint,  # fingerprint
612
                entry_size,   # size
613
                is_executable,# executable
614
                info,         # packed_stat or revision_id
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
615
            ))
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
616
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
617
        # The returned tuple is (key, [trees])
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
618
        ret = (path_name_file_id_key, trees)
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
619
        # Ignore the trailing newline, but assert that it does exist, this
620
        # ensures that we always finish parsing a line on an end-of-entry
621
        # marker.
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
622
        trailing = self.get_next(&cur_size)
623
        if cur_size != 1 or trailing[0] != c'\n':
624
            raise AssertionError(
625
                'Bad parse, we expected to end on \\n, not: %d %s: %s'
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
626
                % (cur_size, PyString_FromStringAndSize(trailing, cur_size),
627
                   ret))
2474.1.38 by John Arbash Meinel
Finally, faster than text.split() (156ms)
628
        return ret
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
629
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
630
    def _parse_dirblocks(self, state):
631
        """Parse all dirblocks in the state file."""
632
        cdef int num_trees
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
633
        cdef object current_block
634
        cdef object entry
635
        cdef void * current_dirname
636
        cdef int new_block
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
637
        cdef int expected_entry_count
638
        cdef int entry_count
639
640
        num_trees = state._num_present_parents() + 1
641
        expected_entry_count = state._num_entries
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
642
643
        # Ignore the first record
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
644
        self._init()
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
645
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
646
        current_block = []
647
        state._dirblocks = [('', current_block), ('', [])]
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
648
        obj = ''
2474.1.37 by John Arbash Meinel
get_next() returns the length of the string,
649
        current_dirname = <void*>obj
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
650
        new_block = 0
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
651
        entry_count = 0
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
652
2474.1.54 by John Arbash Meinel
Optimize the simple case that the strings are the same object.
653
        # TODO: jam 2007-05-07 Consider pre-allocating some space for the
654
        #       members, and then growing and shrinking from there. If most
655
        #       directories have close to 10 entries in them, it would save a
656
        #       few mallocs if we default our list size to something
657
        #       reasonable. Or we could malloc it to something large (100 or
658
        #       so), and then truncate. That would give us a malloc + realloc,
659
        #       rather than lots of reallocs.
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
660
        while self.cur_cstr < self.end_cstr:
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
661
            entry = self._get_entry(num_trees, &current_dirname, &new_block)
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
662
            if new_block:
663
                # new block - different dirname
664
                current_block = []
665
                PyList_Append(state._dirblocks,
666
                              (<object>current_dirname, current_block))
667
            PyList_Append(current_block, entry)
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
668
            entry_count = entry_count + 1
669
        if entry_count != expected_entry_count:
670
            raise AssertionError('We read the wrong number of entries.'
671
                    ' We expected to read %s, but read %s'
672
                    % (expected_entry_count, entry_count))
2474.1.36 by John Arbash Meinel
Move functions into member functions on reader() class.
673
        state._split_root_dirblock_into_contents()
674
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
675
2474.1.47 by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py
676
def _read_dirblocks_c(state):
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
677
    """Read in the dirblocks for the given DirState object.
678
679
    This is tightly bound to the DirState internal representation. It should be
680
    thought of as a member function, which is only separated out so that we can
681
    re-write it in pyrex.
682
683
    :param state: A DirState object.
684
    :return: None
2474.1.70 by John Arbash Meinel
Lot's of fixes from Martin's comments.
685
    :postcondition: The dirblocks will be loaded into the appropriate fields in
686
        the DirState object.
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
687
    """
688
    state._state_file.seek(state._end_of_header)
689
    text = state._state_file.read()
690
    # TODO: check the crc checksums. crc_measured = zlib.crc32(text)
691
2474.1.34 by John Arbash Meinel
Delay reading fields until in parse loop
692
    reader = Reader(text)
2474.1.30 by John Arbash Meinel
Start working towards a parser which uses a Reader (producer)
693
2474.1.46 by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents.
694
    reader._parse_dirblocks(state)
2474.1.1 by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file.
695
    state._dirblock_state = DirState.IN_MEMORY_UNMODIFIED