2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
1 |
# Copyright (C) 2007 Canonical Ltd
|
2 |
#
|
|
3 |
# This program is free software; you can redistribute it and/or modify
|
|
4 |
# it under the terms of the GNU General Public License as published by
|
|
5 |
# the Free Software Foundation; either version 2 of the License, or
|
|
6 |
# (at your option) any later version.
|
|
7 |
#
|
|
8 |
# This program is distributed in the hope that it will be useful,
|
|
9 |
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
10 |
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
11 |
# GNU General Public License for more details.
|
|
12 |
#
|
|
13 |
# You should have received a copy of the GNU General Public License
|
|
14 |
# along with this program; if not, write to the Free Software
|
|
15 |
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
|
16 |
||
17 |
"""Helper functions for DirState.
|
|
18 |
||
19 |
This is the python implementation for DirState functions.
|
|
20 |
"""
|
|
21 |
||
22 |
from bzrlib.dirstate import DirState |
|
23 |
||
24 |
||
2474.1.72
by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments |
25 |
# Give Pyrex some function definitions for it to understand.
|
26 |
# All of these are just hints to Pyrex, so that it can try to convert python
|
|
27 |
# objects into similar C objects. (such as PyInt => int).
|
|
28 |
# In anything defined 'cdef extern from XXX' the real C header will be
|
|
29 |
# imported, and the real definition will be used from there. So these are just
|
|
30 |
# hints, and do not need to match exactly to the C definitions.
|
|
31 |
||
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
32 |
cdef extern from *: |
2474.1.72
by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments |
33 |
ctypedef unsigned long size_t |
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
34 |
|
2668.1.1
by John Arbash Meinel
(Lukáš Lalinský) Add a special header for intptr_t for MSVC which doesn't have it in the standard place |
35 |
cdef extern from "_dirstate_helpers_c.h": |
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
36 |
ctypedef int intptr_t |
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
37 |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
38 |
|
39 |
cdef extern from "stdlib.h": |
|
40 |
unsigned long int strtoul(char *nptr, char **endptr, int base) |
|
41 |
||
42 |
||
2474.1.72
by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments |
43 |
# These functions allow us access to a bit of the 'bare metal' of python
|
44 |
# objects, rather than going through the object abstraction. (For example,
|
|
45 |
# PyList_Append, rather than getting the 'append' attribute of the object, and
|
|
46 |
# creating a tuple, and then using PyCallObject).
|
|
47 |
# Functions that return (or take) a void* are meant to grab a C PyObject*. This
|
|
48 |
# differs from the Pyrex 'object'. If you declare a variable as 'object' Pyrex
|
|
49 |
# will automatically Py_INCREF and Py_DECREF when appropriate. But for some
|
|
50 |
# inner loops, we don't need to do that at all, as the reference only lasts for
|
|
51 |
# a very short time.
|
|
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
52 |
cdef extern from "Python.h": |
2474.1.23
by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree |
53 |
int PyList_Append(object lst, object item) except -1 |
2474.1.12
by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables. |
54 |
void *PyList_GetItem_object_void "PyList_GET_ITEM" (object lst, int index) |
2474.1.10
by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again. |
55 |
int PyList_CheckExact(object) |
2474.1.23
by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree |
56 |
|
57 |
void *PyTuple_GetItem_void_void "PyTuple_GET_ITEM" (void* tpl, int index) |
|
2474.1.10
by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again. |
58 |
|
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
59 |
char *PyString_AsString(object p) |
2474.1.16
by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions. |
60 |
char *PyString_AS_STRING_void "PyString_AS_STRING" (void *p) |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
61 |
object PyString_FromString(char *) |
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
62 |
object PyString_FromStringAndSize(char *, int) |
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
63 |
int PyString_Size(object p) |
2474.1.16
by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions. |
64 |
int PyString_GET_SIZE_void "PyString_GET_SIZE" (void *p) |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
65 |
int PyString_CheckExact(object p) |
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
66 |
|
2474.1.23
by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree |
67 |
|
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
68 |
cdef extern from "string.h": |
2474.1.25
by John Arbash Meinel
Refactor into a helper function to make implementation clearer |
69 |
int strncmp(char *s1, char *s2, int len) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
70 |
void *memchr(void *s, int c, size_t len) |
71 |
int memcmp(void *b1, void *b2, size_t len) |
|
72 |
# ??? memrchr is a GNU extension :(
|
|
73 |
# void *memrchr(void *s, int c, size_t len)
|
|
74 |
||
75 |
||
76 |
cdef void* _my_memrchr(void *s, int c, size_t n): |
|
77 |
# memrchr seems to be a GNU extension, so we have to implement it ourselves
|
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
78 |
cdef char *pos |
79 |
cdef char *start |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
80 |
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
81 |
start = <char*>s |
82 |
pos = start + n - 1 |
|
83 |
while pos >= start: |
|
84 |
if pos[0] == c: |
|
85 |
return <void*>pos |
|
86 |
pos = pos - 1 |
|
87 |
return NULL |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
88 |
|
89 |
||
90 |
def _py_memrchr(s, c): |
|
91 |
"""Just to expose _my_memrchr for testing.
|
|
92 |
||
93 |
:param s: The Python string to search
|
|
94 |
:param c: The character to search for
|
|
95 |
:return: The offset to the last instance of 'c' in s
|
|
96 |
"""
|
|
97 |
cdef void *_s |
|
98 |
cdef void *found |
|
99 |
cdef int length |
|
100 |
cdef char *_c |
|
101 |
||
102 |
_s = PyString_AsString(s) |
|
103 |
length = PyString_Size(s) |
|
104 |
||
105 |
_c = PyString_AsString(c) |
|
106 |
assert PyString_Size(c) == 1,\ |
|
107 |
'Must be a single character string, not %s' % (c,) |
|
108 |
found = _my_memrchr(_s, _c[0], length) |
|
109 |
if found == NULL: |
|
110 |
return None |
|
2668.1.1
by John Arbash Meinel
(Lukáš Lalinský) Add a special header for intptr_t for MSVC which doesn't have it in the standard place |
111 |
return <char*>found - <char*>_s |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
112 |
|
113 |
||
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
114 |
cdef int _is_aligned(void *ptr): |
115 |
"""Is this pointer aligned to an integer size offset?
|
|
116 |
||
117 |
:return: 1 if this pointer is aligned, 0 otherwise.
|
|
118 |
"""
|
|
119 |
return ((<intptr_t>ptr) & ((sizeof(int))-1)) == 0 |
|
120 |
||
121 |
||
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
122 |
cdef int _cmp_by_dirs(char *path1, int size1, char *path2, int size2): |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
123 |
cdef unsigned char *cur1 |
124 |
cdef unsigned char *cur2 |
|
125 |
cdef unsigned char *end1 |
|
126 |
cdef unsigned char *end2 |
|
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
127 |
cdef int *cur_int1 |
128 |
cdef int *cur_int2 |
|
129 |
cdef int *end_int1 |
|
130 |
cdef int *end_int2 |
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
131 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
132 |
if path1 == path2 and size1 == size2: |
2474.1.54
by John Arbash Meinel
Optimize the simple case that the strings are the same object. |
133 |
return 0 |
134 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
135 |
end1 = <unsigned char*>path1+size1 |
136 |
end2 = <unsigned char*>path2+size2 |
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
137 |
|
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
138 |
# Use 32-bit comparisons for the matching portion of the string.
|
139 |
# Almost all CPU's are faster at loading and comparing 32-bit integers,
|
|
140 |
# than they are at 8-bit integers.
|
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
141 |
# 99% of the time, these will be aligned, but in case they aren't just skip
|
142 |
# this loop
|
|
143 |
if _is_aligned(path1) and _is_aligned(path2): |
|
144 |
cur_int1 = <int*>path1 |
|
145 |
cur_int2 = <int*>path2 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
146 |
end_int1 = <int*>(path1 + size1 - (size1 % sizeof(int))) |
147 |
end_int2 = <int*>(path2 + size2 - (size2 % sizeof(int))) |
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
148 |
|
149 |
while cur_int1 < end_int1 and cur_int2 < end_int2: |
|
150 |
if cur_int1[0] != cur_int2[0]: |
|
151 |
break
|
|
152 |
cur_int1 = cur_int1 + 1 |
|
153 |
cur_int2 = cur_int2 + 1 |
|
154 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
155 |
cur1 = <unsigned char*>cur_int1 |
156 |
cur2 = <unsigned char*>cur_int2 |
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
157 |
else: |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
158 |
cur1 = <unsigned char*>path1 |
159 |
cur2 = <unsigned char*>path2 |
|
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
160 |
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
161 |
while cur1 < end1 and cur2 < end2: |
162 |
if cur1[0] == cur2[0]: |
|
163 |
# This character matches, just go to the next one
|
|
164 |
cur1 = cur1 + 1 |
|
165 |
cur2 = cur2 + 1 |
|
166 |
continue
|
|
167 |
# The current characters do not match
|
|
168 |
if cur1[0] == c'/': |
|
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
169 |
return -1 # Reached the end of path1 segment first |
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
170 |
elif cur2[0] == c'/': |
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
171 |
return 1 # Reached the end of path2 segment first |
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
172 |
elif cur1[0] < cur2[0]: |
173 |
return -1 |
|
174 |
else: |
|
175 |
return 1 |
|
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
176 |
|
177 |
# We reached the end of at least one of the strings
|
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
178 |
if cur1 < end1: |
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
179 |
return 1 # Not at the end of cur1, must be at the end of cur2 |
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
180 |
if cur2 < end2: |
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
181 |
return -1 # At the end of cur1, but not at cur2 |
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
182 |
# We reached the end of both strings
|
183 |
return 0 |
|
184 |
||
185 |
||
2474.1.47
by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py |
186 |
def cmp_by_dirs_c(path1, path2): |
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
187 |
"""Compare two paths directory by directory.
|
188 |
||
189 |
This is equivalent to doing::
|
|
190 |
||
191 |
cmp(path1.split('/'), path2.split('/'))
|
|
192 |
||
193 |
The idea is that you should compare path components separately. This
|
|
194 |
differs from plain ``cmp(path1, path2)`` for paths like ``'a-b'`` and
|
|
195 |
``a/b``. "a-b" comes after "a" but would come before "a/b" lexically.
|
|
196 |
||
197 |
:param path1: first path
|
|
198 |
:param path2: second path
|
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
199 |
:return: negative number if ``path1`` comes first,
|
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
200 |
0 if paths are equal,
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
201 |
and positive number if ``path2`` sorts first
|
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
202 |
"""
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
203 |
if not PyString_CheckExact(path1): |
204 |
raise TypeError("'path1' must be a plain string, not %s: %r" |
|
205 |
% (type(path1), path1)) |
|
206 |
if not PyString_CheckExact(path2): |
|
207 |
raise TypeError("'path2' must be a plain string, not %s: %r" |
|
208 |
% (type(path2), path2)) |
|
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
209 |
return _cmp_by_dirs(PyString_AsString(path1), |
210 |
PyString_Size(path1), |
|
211 |
PyString_AsString(path2), |
|
212 |
PyString_Size(path2)) |
|
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
213 |
|
214 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
215 |
def _cmp_path_by_dirblock_c(path1, path2): |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
216 |
"""Compare two paths based on what directory they are in.
|
217 |
||
218 |
This generates a sort order, such that all children of a directory are
|
|
219 |
sorted together, and grandchildren are in the same order as the
|
|
220 |
children appear. But all grandchildren come after all children.
|
|
221 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
222 |
In other words, all entries in a directory are sorted together, and
|
223 |
directorys are sorted in cmp_by_dirs order.
|
|
224 |
||
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
225 |
:param path1: first path
|
226 |
:param path2: the second path
|
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
227 |
:return: negative number if ``path1`` comes first,
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
228 |
0 if paths are equal
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
229 |
and a positive number if ``path2`` sorts first
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
230 |
"""
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
231 |
if not PyString_CheckExact(path1): |
232 |
raise TypeError("'path1' must be a plain string, not %s: %r" |
|
233 |
% (type(path1), path1)) |
|
234 |
if not PyString_CheckExact(path2): |
|
235 |
raise TypeError("'path2' must be a plain string, not %s: %r" |
|
236 |
% (type(path2), path2)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
237 |
return _cmp_path_by_dirblock(PyString_AsString(path1), |
238 |
PyString_Size(path1), |
|
239 |
PyString_AsString(path2), |
|
240 |
PyString_Size(path2)) |
|
241 |
||
242 |
||
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
243 |
cdef int _cmp_path_by_dirblock(char *path1, int path1_len, |
244 |
char *path2, int path2_len): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
245 |
"""Compare two paths by what directory they are in.
|
246 |
||
247 |
see ``_cmp_path_by_dirblock_c`` for details.
|
|
248 |
"""
|
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
249 |
cdef char *dirname1 |
250 |
cdef int dirname1_len |
|
251 |
cdef char *dirname2 |
|
252 |
cdef int dirname2_len |
|
253 |
cdef char *basename1 |
|
254 |
cdef int basename1_len |
|
255 |
cdef char *basename2 |
|
256 |
cdef int basename2_len |
|
257 |
cdef int cur_len |
|
258 |
cdef int cmp_val |
|
259 |
||
260 |
if path1_len == 0 and path2_len == 0: |
|
261 |
return 0 |
|
262 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
263 |
if path1 == path2 and path1_len == path2_len: |
264 |
return 0 |
|
265 |
||
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
266 |
if path1_len == 0: |
267 |
return -1 |
|
268 |
||
269 |
if path2_len == 0: |
|
270 |
return 1 |
|
271 |
||
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
272 |
basename1 = <char*>_my_memrchr(path1, c'/', path1_len) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
273 |
|
274 |
if basename1 == NULL: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
275 |
basename1 = path1 |
276 |
basename1_len = path1_len |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
277 |
dirname1 = '' |
278 |
dirname1_len = 0 |
|
279 |
else: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
280 |
dirname1 = path1 |
281 |
dirname1_len = basename1 - path1 |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
282 |
basename1 = basename1 + 1 |
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
283 |
basename1_len = path1_len - dirname1_len - 1 |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
284 |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
285 |
basename2 = <char*>_my_memrchr(path2, c'/', path2_len) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
286 |
|
287 |
if basename2 == NULL: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
288 |
basename2 = path2 |
289 |
basename2_len = path2_len |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
290 |
dirname2 = '' |
291 |
dirname2_len = 0 |
|
292 |
else: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
293 |
dirname2 = path2 |
294 |
dirname2_len = basename2 - path2 |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
295 |
basename2 = basename2 + 1 |
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
296 |
basename2_len = path2_len - dirname2_len - 1 |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
297 |
|
298 |
cmp_val = _cmp_by_dirs(dirname1, dirname1_len, |
|
299 |
dirname2, dirname2_len) |
|
300 |
if cmp_val != 0: |
|
301 |
return cmp_val |
|
302 |
||
303 |
cur_len = basename1_len |
|
304 |
if basename2_len < basename1_len: |
|
305 |
cur_len = basename2_len |
|
306 |
||
307 |
cmp_val = memcmp(basename1, basename2, cur_len) |
|
308 |
if cmp_val != 0: |
|
309 |
return cmp_val |
|
310 |
if basename1_len == basename2_len: |
|
311 |
return 0 |
|
312 |
if basename1_len < basename2_len: |
|
313 |
return -1 |
|
314 |
return 1 |
|
315 |
||
316 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
317 |
def _bisect_path_left_c(paths, path): |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
318 |
"""Return the index where to insert path into paths.
|
319 |
||
320 |
This uses a path-wise comparison so we get::
|
|
321 |
a
|
|
322 |
a-b
|
|
323 |
a=b
|
|
324 |
a/b
|
|
325 |
Rather than::
|
|
326 |
a
|
|
327 |
a-b
|
|
328 |
a/b
|
|
329 |
a=b
|
|
330 |
:param paths: A list of paths to search through
|
|
331 |
:param path: A single path to insert
|
|
332 |
:return: An offset where 'path' can be inserted.
|
|
333 |
:seealso: bisect.bisect_left
|
|
334 |
"""
|
|
335 |
cdef int _lo |
|
336 |
cdef int _hi |
|
337 |
cdef int _mid |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
338 |
cdef char *path_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
339 |
cdef int path_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
340 |
cdef char *cur_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
341 |
cdef int cur_size |
342 |
cdef void *cur |
|
343 |
||
344 |
if not PyList_CheckExact(paths): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
345 |
raise TypeError("you must pass a python list for 'paths' not: %s %r" |
346 |
% (type(paths), paths)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
347 |
if not PyString_CheckExact(path): |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
348 |
raise TypeError("you must pass a string for 'path' not: %s %r" |
349 |
% (type(path), path)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
350 |
|
351 |
_hi = len(paths) |
|
352 |
_lo = 0 |
|
353 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
354 |
path_cstr = PyString_AsString(path) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
355 |
path_size = PyString_Size(path) |
356 |
||
357 |
while _lo < _hi: |
|
358 |
_mid = (_lo + _hi) / 2 |
|
359 |
cur = PyList_GetItem_object_void(paths, _mid) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
360 |
cur_cstr = PyString_AS_STRING_void(cur) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
361 |
cur_size = PyString_GET_SIZE_void(cur) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
362 |
if _cmp_path_by_dirblock(cur_cstr, cur_size, path_cstr, path_size) < 0: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
363 |
_lo = _mid + 1 |
364 |
else: |
|
365 |
_hi = _mid |
|
366 |
return _lo |
|
367 |
||
368 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
369 |
def _bisect_path_right_c(paths, path): |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
370 |
"""Return the index where to insert path into paths.
|
371 |
||
372 |
This uses a path-wise comparison so we get::
|
|
373 |
a
|
|
374 |
a-b
|
|
375 |
a=b
|
|
376 |
a/b
|
|
377 |
Rather than::
|
|
378 |
a
|
|
379 |
a-b
|
|
380 |
a/b
|
|
381 |
a=b
|
|
382 |
:param paths: A list of paths to search through
|
|
383 |
:param path: A single path to insert
|
|
384 |
:return: An offset where 'path' can be inserted.
|
|
385 |
:seealso: bisect.bisect_right
|
|
386 |
"""
|
|
387 |
cdef int _lo |
|
388 |
cdef int _hi |
|
389 |
cdef int _mid |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
390 |
cdef char *path_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
391 |
cdef int path_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
392 |
cdef char *cur_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
393 |
cdef int cur_size |
394 |
cdef void *cur |
|
395 |
||
396 |
if not PyList_CheckExact(paths): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
397 |
raise TypeError("you must pass a python list for 'paths' not: %s %r" |
398 |
% (type(paths), paths)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
399 |
if not PyString_CheckExact(path): |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
400 |
raise TypeError("you must pass a string for 'path' not: %s %r" |
401 |
% (type(path), path)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
402 |
|
403 |
_hi = len(paths) |
|
404 |
_lo = 0 |
|
405 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
406 |
path_cstr = PyString_AsString(path) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
407 |
path_size = PyString_Size(path) |
408 |
||
409 |
while _lo < _hi: |
|
410 |
_mid = (_lo + _hi) / 2 |
|
411 |
cur = PyList_GetItem_object_void(paths, _mid) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
412 |
cur_cstr = PyString_AS_STRING_void(cur) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
413 |
cur_size = PyString_GET_SIZE_void(cur) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
414 |
if _cmp_path_by_dirblock(path_cstr, path_size, cur_cstr, cur_size) < 0: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
415 |
_hi = _mid |
416 |
else: |
|
417 |
_lo = _mid + 1 |
|
418 |
return _lo |
|
419 |
||
420 |
||
2474.1.47
by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py |
421 |
def bisect_dirblock_c(dirblocks, dirname, lo=0, hi=None, cache=None): |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
422 |
"""Return the index where to insert dirname into the dirblocks.
|
423 |
||
424 |
The return value idx is such that all directories blocks in dirblock[:idx]
|
|
425 |
have names < dirname, and all blocks in dirblock[idx:] have names >=
|
|
426 |
dirname.
|
|
427 |
||
428 |
Optional args lo (default 0) and hi (default len(dirblocks)) bound the
|
|
429 |
slice of a to be searched.
|
|
430 |
"""
|
|
431 |
cdef int _lo |
|
432 |
cdef int _hi |
|
433 |
cdef int _mid |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
434 |
cdef char *dirname_cstr |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
435 |
cdef int dirname_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
436 |
cdef char *cur_cstr |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
437 |
cdef int cur_size |
2474.1.12
by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables. |
438 |
cdef void *cur |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
439 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
440 |
if not PyList_CheckExact(dirblocks): |
441 |
raise TypeError("you must pass a python list for 'dirblocks' not: %s %r" |
|
442 |
% (type(dirblocks), dirblocks)) |
|
443 |
if not PyString_CheckExact(dirname): |
|
444 |
raise TypeError("you must pass a string for dirname not: %s %r" |
|
445 |
% (type(dirname), dirname)) |
|
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
446 |
if hi is None: |
447 |
_hi = len(dirblocks) |
|
448 |
else: |
|
449 |
_hi = hi |
|
450 |
||
451 |
_lo = lo |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
452 |
dirname_cstr = PyString_AsString(dirname) |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
453 |
dirname_size = PyString_Size(dirname) |
454 |
||
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
455 |
while _lo < _hi: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
456 |
_mid = (_lo + _hi) / 2 |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
457 |
# Grab the dirname for the current dirblock
|
2474.1.12
by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables. |
458 |
# cur = dirblocks[_mid][0]
|
459 |
cur = PyTuple_GetItem_void_void( |
|
460 |
PyList_GetItem_object_void(dirblocks, _mid), 0) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
461 |
cur_cstr = PyString_AS_STRING_void(cur) |
2474.1.16
by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions. |
462 |
cur_size = PyString_GET_SIZE_void(cur) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
463 |
if _cmp_by_dirs(cur_cstr, cur_size, dirname_cstr, dirname_size) < 0: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
464 |
_lo = _mid + 1 |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
465 |
else: |
466 |
_hi = _mid |
|
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
467 |
return _lo |
468 |
||
469 |
||
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
470 |
cdef class Reader: |
471 |
"""Maintain the current location, and return fields as you parse them."""
|
|
472 |
||
473 |
cdef object text # The overall string object |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
474 |
cdef char *text_cstr # Pointer to the beginning of text |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
475 |
cdef int text_size # Length of text |
476 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
477 |
cdef char *end_cstr # End of text |
478 |
cdef char *cur_cstr # Pointer to the current record |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
479 |
cdef char *next # Pointer to the end of this record |
480 |
||
2904.2.1
by John Arbash Meinel
Switch from __new__ to __init__ to avoid potential pyrex upgrade problems. |
481 |
def __init__(self, text): |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
482 |
self.text = text |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
483 |
self.text_cstr = PyString_AsString(text) |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
484 |
self.text_size = PyString_Size(text) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
485 |
self.end_cstr = self.text_cstr + self.text_size |
486 |
self.cur_cstr = self.text_cstr |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
487 |
|
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
488 |
cdef char *get_next(self, int *size): |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
489 |
"""Return a pointer to the start of the next field."""
|
490 |
cdef char *next |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
491 |
next = self.cur_cstr |
492 |
self.cur_cstr = <char*>memchr(next, c'\0', self.end_cstr-next) |
|
493 |
size[0] = self.cur_cstr - next |
|
494 |
self.cur_cstr = self.cur_cstr + 1 |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
495 |
return next |
496 |
||
2474.1.53
by John Arbash Meinel
Changing Reader.get_next_str (which returns a Python String) |
497 |
cdef object get_next_str(self): |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
498 |
"""Get the next field as a Python string."""
|
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
499 |
cdef int size |
500 |
cdef char *next |
|
501 |
next = self.get_next(&size) |
|
502 |
return PyString_FromStringAndSize(next, size) |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
503 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
504 |
cdef int _init(self) except -1: |
505 |
"""Get the pointer ready.
|
|
506 |
||
507 |
This assumes that the dirstate header has already been read, and we
|
|
508 |
already have the dirblock string loaded into memory.
|
|
509 |
This just initializes our memory pointers, etc for parsing of the
|
|
510 |
dirblock string.
|
|
511 |
"""
|
|
2474.1.32
by John Arbash Meinel
Skip past the first entry while reading, |
512 |
cdef char *first |
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
513 |
cdef int size |
2474.1.32
by John Arbash Meinel
Skip past the first entry while reading, |
514 |
# The first field should be an empty string left over from the Header
|
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
515 |
first = self.get_next(&size) |
516 |
if first[0] != c'\0' and size == 0: |
|
2474.1.32
by John Arbash Meinel
Skip past the first entry while reading, |
517 |
raise AssertionError('First character should be null not: %s' |
518 |
% (first,)) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
519 |
return 0 |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
520 |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
521 |
cdef object _get_entry(self, int num_trees, void **p_current_dirname, |
522 |
int *new_block): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
523 |
"""Extract the next entry.
|
524 |
||
525 |
This parses the next entry based on the current location in
|
|
526 |
``self.cur_cstr``.
|
|
527 |
Each entry can be considered a "row" in the total table. And each row
|
|
528 |
has a fixed number of columns. It is generally broken up into "key"
|
|
529 |
columns, then "current" columns, and then "parent" columns.
|
|
530 |
||
531 |
:param num_trees: How many parent trees need to be parsed
|
|
532 |
:param p_current_dirname: A pointer to the current PyString
|
|
533 |
representing the directory name.
|
|
534 |
We pass this in as a void * so that pyrex doesn't have to
|
|
535 |
increment/decrement the PyObject reference counter for each
|
|
536 |
_get_entry call.
|
|
537 |
We use a pointer so that _get_entry can update it with the new
|
|
538 |
value.
|
|
539 |
:param new_block: This is to let the caller know that it needs to
|
|
540 |
create a new directory block to store the next entry.
|
|
541 |
"""
|
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
542 |
cdef object path_name_file_id_key |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
543 |
cdef char *entry_size_cstr |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
544 |
cdef unsigned long int entry_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
545 |
cdef char* executable_cstr |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
546 |
cdef int is_executable |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
547 |
cdef char* dirname_cstr |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
548 |
cdef char* trailing |
549 |
cdef int cur_size |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
550 |
cdef int i |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
551 |
cdef object minikind |
552 |
cdef object fingerprint |
|
553 |
cdef object info |
|
554 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
555 |
# Read the 'key' information (dirname, name, file_id)
|
556 |
dirname_cstr = self.get_next(&cur_size) |
|
557 |
# Check to see if we have started a new directory block.
|
|
558 |
# If so, then we need to create a new dirname PyString, so that it can
|
|
559 |
# be used in all of the tuples. This saves time and memory, by re-using
|
|
560 |
# the same object repeatedly.
|
|
561 |
||
562 |
# Do the cheap 'length of string' check first. If the string is a
|
|
563 |
# different length, then we *have* to be a different directory.
|
|
564 |
if (cur_size != PyString_GET_SIZE_void(p_current_dirname[0]) |
|
565 |
or strncmp(dirname_cstr, |
|
566 |
# Extract the char* from our current dirname string. We
|
|
567 |
# know it is a PyString, so we can use
|
|
568 |
# PyString_AS_STRING, we use the _void version because
|
|
569 |
# we are tricking Pyrex by using a void* rather than an
|
|
570 |
# <object>
|
|
571 |
PyString_AS_STRING_void(p_current_dirname[0]), |
|
572 |
cur_size+1) != 0): |
|
573 |
dirname = PyString_FromStringAndSize(dirname_cstr, cur_size) |
|
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
574 |
p_current_dirname[0] = <void*>dirname |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
575 |
new_block[0] = 1 |
576 |
else: |
|
577 |
new_block[0] = 0 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
578 |
|
579 |
# Build up the key that will be used.
|
|
580 |
# By using <object>(void *) Pyrex will automatically handle the
|
|
581 |
# Py_INCREF that we need.
|
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
582 |
path_name_file_id_key = (<object>p_current_dirname[0], |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
583 |
self.get_next_str(), |
584 |
self.get_next_str(), |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
585 |
)
|
586 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
587 |
# Parse all of the per-tree information. current has the information in
|
588 |
# the same location as parent trees. The only difference is that 'info'
|
|
589 |
# is a 'packed_stat' for current, while it is a 'revision_id' for
|
|
590 |
# parent trees.
|
|
591 |
# minikind, fingerprint, and info will be returned as regular python
|
|
592 |
# strings
|
|
593 |
# entry_size and is_executable will be parsed into a python Long and
|
|
594 |
# python Boolean, respectively.
|
|
595 |
# TODO: jam 20070718 Consider changin the entry_size conversion to
|
|
596 |
# prefer python Int when possible. They are generally faster to
|
|
597 |
# work with, and it will be rare that we have a file >2GB.
|
|
598 |
# Especially since this code is pretty much fixed at a max of
|
|
599 |
# 4GB.
|
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
600 |
trees = [] |
601 |
for i from 0 <= i < num_trees: |
|
602 |
minikind = self.get_next_str() |
|
603 |
fingerprint = self.get_next_str() |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
604 |
entry_size_cstr = self.get_next(&cur_size) |
605 |
entry_size = strtoul(entry_size_cstr, NULL, 10) |
|
606 |
executable_cstr = self.get_next(&cur_size) |
|
607 |
is_executable = (executable_cstr[0] == c'y') |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
608 |
info = self.get_next_str() |
609 |
PyList_Append(trees, ( |
|
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
610 |
minikind, # minikind |
611 |
fingerprint, # fingerprint |
|
612 |
entry_size, # size |
|
613 |
is_executable,# executable |
|
614 |
info, # packed_stat or revision_id |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
615 |
))
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
616 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
617 |
# The returned tuple is (key, [trees])
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
618 |
ret = (path_name_file_id_key, trees) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
619 |
# Ignore the trailing newline, but assert that it does exist, this
|
620 |
# ensures that we always finish parsing a line on an end-of-entry
|
|
621 |
# marker.
|
|
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
622 |
trailing = self.get_next(&cur_size) |
623 |
if cur_size != 1 or trailing[0] != c'\n': |
|
624 |
raise AssertionError( |
|
625 |
'Bad parse, we expected to end on \\n, not: %d %s: %s' |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
626 |
% (cur_size, PyString_FromStringAndSize(trailing, cur_size), |
627 |
ret)) |
|
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
628 |
return ret |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
629 |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
630 |
def _parse_dirblocks(self, state): |
631 |
"""Parse all dirblocks in the state file."""
|
|
632 |
cdef int num_trees |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
633 |
cdef object current_block |
634 |
cdef object entry |
|
635 |
cdef void * current_dirname |
|
636 |
cdef int new_block |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
637 |
cdef int expected_entry_count |
638 |
cdef int entry_count |
|
639 |
||
640 |
num_trees = state._num_present_parents() + 1 |
|
641 |
expected_entry_count = state._num_entries |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
642 |
|
643 |
# Ignore the first record
|
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
644 |
self._init() |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
645 |
|
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
646 |
current_block = [] |
647 |
state._dirblocks = [('', current_block), ('', [])] |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
648 |
obj = '' |
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
649 |
current_dirname = <void*>obj |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
650 |
new_block = 0 |
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
651 |
entry_count = 0 |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
652 |
|
2474.1.54
by John Arbash Meinel
Optimize the simple case that the strings are the same object. |
653 |
# TODO: jam 2007-05-07 Consider pre-allocating some space for the
|
654 |
# members, and then growing and shrinking from there. If most
|
|
655 |
# directories have close to 10 entries in them, it would save a
|
|
656 |
# few mallocs if we default our list size to something
|
|
657 |
# reasonable. Or we could malloc it to something large (100 or
|
|
658 |
# so), and then truncate. That would give us a malloc + realloc,
|
|
659 |
# rather than lots of reallocs.
|
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
660 |
while self.cur_cstr < self.end_cstr: |
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
661 |
entry = self._get_entry(num_trees, ¤t_dirname, &new_block) |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
662 |
if new_block: |
663 |
# new block - different dirname
|
|
664 |
current_block = [] |
|
665 |
PyList_Append(state._dirblocks, |
|
666 |
(<object>current_dirname, current_block)) |
|
667 |
PyList_Append(current_block, entry) |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
668 |
entry_count = entry_count + 1 |
669 |
if entry_count != expected_entry_count: |
|
670 |
raise AssertionError('We read the wrong number of entries.' |
|
671 |
' We expected to read %s, but read %s' |
|
672 |
% (expected_entry_count, entry_count)) |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
673 |
state._split_root_dirblock_into_contents() |
674 |
||
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
675 |
|
2474.1.47
by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py |
676 |
def _read_dirblocks_c(state): |
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
677 |
"""Read in the dirblocks for the given DirState object.
|
678 |
||
679 |
This is tightly bound to the DirState internal representation. It should be
|
|
680 |
thought of as a member function, which is only separated out so that we can
|
|
681 |
re-write it in pyrex.
|
|
682 |
||
683 |
:param state: A DirState object.
|
|
684 |
:return: None
|
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
685 |
:postcondition: The dirblocks will be loaded into the appropriate fields in
|
686 |
the DirState object.
|
|
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
687 |
"""
|
688 |
state._state_file.seek(state._end_of_header) |
|
689 |
text = state._state_file.read() |
|
690 |
# TODO: check the crc checksums. crc_measured = zlib.crc32(text)
|
|
691 |
||
2474.1.34
by John Arbash Meinel
Delay reading fields until in parse loop |
692 |
reader = Reader(text) |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
693 |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
694 |
reader._parse_dirblocks(state) |
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
695 |
state._dirblock_state = DirState.IN_MEMORY_UNMODIFIED |