3640.2.4
by John Arbash Meinel
Copyright updates |
1 |
# Copyright (C) 2007, 2008 Canonical Ltd
|
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
2 |
#
|
3 |
# This program is free software; you can redistribute it and/or modify
|
|
4 |
# it under the terms of the GNU General Public License as published by
|
|
5 |
# the Free Software Foundation; either version 2 of the License, or
|
|
6 |
# (at your option) any later version.
|
|
7 |
#
|
|
8 |
# This program is distributed in the hope that it will be useful,
|
|
9 |
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
10 |
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
11 |
# GNU General Public License for more details.
|
|
12 |
#
|
|
13 |
# You should have received a copy of the GNU General Public License
|
|
14 |
# along with this program; if not, write to the Free Software
|
|
15 |
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
|
16 |
||
17 |
"""Helper functions for DirState.
|
|
18 |
||
19 |
This is the python implementation for DirState functions.
|
|
20 |
"""
|
|
21 |
||
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
22 |
from bzrlib import errors |
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
23 |
from bzrlib.dirstate import DirState |
24 |
||
25 |
||
2474.1.72
by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments |
26 |
# Give Pyrex some function definitions for it to understand.
|
27 |
# All of these are just hints to Pyrex, so that it can try to convert python
|
|
28 |
# objects into similar C objects. (such as PyInt => int).
|
|
29 |
# In anything defined 'cdef extern from XXX' the real C header will be
|
|
30 |
# imported, and the real definition will be used from there. So these are just
|
|
31 |
# hints, and do not need to match exactly to the C definitions.
|
|
32 |
||
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
33 |
cdef extern from *: |
2474.1.72
by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments |
34 |
ctypedef unsigned long size_t |
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
35 |
|
2668.1.1
by John Arbash Meinel
(Lukáš Lalinský) Add a special header for intptr_t for MSVC which doesn't have it in the standard place |
36 |
cdef extern from "_dirstate_helpers_c.h": |
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
37 |
ctypedef int intptr_t |
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
38 |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
39 |
|
40 |
cdef extern from "stdlib.h": |
|
41 |
unsigned long int strtoul(char *nptr, char **endptr, int base) |
|
42 |
||
43 |
||
2474.1.72
by John Arbash Meinel
Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments |
44 |
# These functions allow us access to a bit of the 'bare metal' of python
|
45 |
# objects, rather than going through the object abstraction. (For example,
|
|
46 |
# PyList_Append, rather than getting the 'append' attribute of the object, and
|
|
47 |
# creating a tuple, and then using PyCallObject).
|
|
48 |
# Functions that return (or take) a void* are meant to grab a C PyObject*. This
|
|
49 |
# differs from the Pyrex 'object'. If you declare a variable as 'object' Pyrex
|
|
50 |
# will automatically Py_INCREF and Py_DECREF when appropriate. But for some
|
|
51 |
# inner loops, we don't need to do that at all, as the reference only lasts for
|
|
52 |
# a very short time.
|
|
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
53 |
cdef extern from "Python.h": |
3640.2.6
by John Arbash Meinel
PQM's pyrex needs a Py_ssize_t typedef. |
54 |
ctypedef int Py_ssize_t |
2474.1.23
by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree |
55 |
int PyList_Append(object lst, object item) except -1 |
2474.1.12
by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables. |
56 |
void *PyList_GetItem_object_void "PyList_GET_ITEM" (object lst, int index) |
2474.1.10
by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again. |
57 |
int PyList_CheckExact(object) |
2474.1.23
by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree |
58 |
|
59 |
void *PyTuple_GetItem_void_void "PyTuple_GET_ITEM" (void* tpl, int index) |
|
2474.1.10
by John Arbash Meinel
Explicitly calling Py_INCREF makes things happier again. |
60 |
|
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
61 |
char *PyString_AsString(object p) |
2474.1.16
by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions. |
62 |
char *PyString_AS_STRING_void "PyString_AS_STRING" (void *p) |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
63 |
object PyString_FromString(char *) |
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
64 |
object PyString_FromStringAndSize(char *, Py_ssize_t) |
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
65 |
int PyString_Size(object p) |
2474.1.16
by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions. |
66 |
int PyString_GET_SIZE_void "PyString_GET_SIZE" (void *p) |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
67 |
int PyString_CheckExact(object p) |
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
68 |
|
2474.1.23
by John Arbash Meinel
A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree |
69 |
|
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
70 |
cdef extern from "string.h": |
2474.1.25
by John Arbash Meinel
Refactor into a helper function to make implementation clearer |
71 |
int strncmp(char *s1, char *s2, int len) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
72 |
void *memchr(void *s, int c, size_t len) |
73 |
int memcmp(void *b1, void *b2, size_t len) |
|
74 |
# ??? memrchr is a GNU extension :(
|
|
75 |
# void *memrchr(void *s, int c, size_t len)
|
|
76 |
||
77 |
||
78 |
cdef void* _my_memrchr(void *s, int c, size_t n): |
|
79 |
# memrchr seems to be a GNU extension, so we have to implement it ourselves
|
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
80 |
cdef char *pos |
81 |
cdef char *start |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
82 |
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
83 |
start = <char*>s |
84 |
pos = start + n - 1 |
|
85 |
while pos >= start: |
|
86 |
if pos[0] == c: |
|
87 |
return <void*>pos |
|
88 |
pos = pos - 1 |
|
89 |
return NULL |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
90 |
|
91 |
||
92 |
def _py_memrchr(s, c): |
|
93 |
"""Just to expose _my_memrchr for testing.
|
|
94 |
||
95 |
:param s: The Python string to search
|
|
96 |
:param c: The character to search for
|
|
97 |
:return: The offset to the last instance of 'c' in s
|
|
98 |
"""
|
|
99 |
cdef void *_s |
|
100 |
cdef void *found |
|
101 |
cdef int length |
|
102 |
cdef char *_c |
|
103 |
||
104 |
_s = PyString_AsString(s) |
|
105 |
length = PyString_Size(s) |
|
106 |
||
107 |
_c = PyString_AsString(c) |
|
108 |
assert PyString_Size(c) == 1,\ |
|
109 |
'Must be a single character string, not %s' % (c,) |
|
110 |
found = _my_memrchr(_s, _c[0], length) |
|
111 |
if found == NULL: |
|
112 |
return None |
|
2668.1.1
by John Arbash Meinel
(Lukáš Lalinský) Add a special header for intptr_t for MSVC which doesn't have it in the standard place |
113 |
return <char*>found - <char*>_s |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
114 |
|
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
115 |
cdef object safe_string_from_size(char *s, Py_ssize_t size): |
116 |
if size < 0: |
|
117 |
raise AssertionError( |
|
118 |
'tried to create a string with an invalid size: %d @0x%x' |
|
119 |
% (size, <int>s)) |
|
120 |
return PyString_FromStringAndSize(s, size) |
|
121 |
||
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
122 |
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
123 |
cdef int _is_aligned(void *ptr): |
124 |
"""Is this pointer aligned to an integer size offset?
|
|
125 |
||
126 |
:return: 1 if this pointer is aligned, 0 otherwise.
|
|
127 |
"""
|
|
128 |
return ((<intptr_t>ptr) & ((sizeof(int))-1)) == 0 |
|
129 |
||
130 |
||
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
131 |
cdef int _cmp_by_dirs(char *path1, int size1, char *path2, int size2): |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
132 |
cdef unsigned char *cur1 |
133 |
cdef unsigned char *cur2 |
|
134 |
cdef unsigned char *end1 |
|
135 |
cdef unsigned char *end2 |
|
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
136 |
cdef int *cur_int1 |
137 |
cdef int *cur_int2 |
|
138 |
cdef int *end_int1 |
|
139 |
cdef int *end_int2 |
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
140 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
141 |
if path1 == path2 and size1 == size2: |
2474.1.54
by John Arbash Meinel
Optimize the simple case that the strings are the same object. |
142 |
return 0 |
143 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
144 |
end1 = <unsigned char*>path1+size1 |
145 |
end2 = <unsigned char*>path2+size2 |
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
146 |
|
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
147 |
# Use 32-bit comparisons for the matching portion of the string.
|
148 |
# Almost all CPU's are faster at loading and comparing 32-bit integers,
|
|
149 |
# than they are at 8-bit integers.
|
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
150 |
# 99% of the time, these will be aligned, but in case they aren't just skip
|
151 |
# this loop
|
|
152 |
if _is_aligned(path1) and _is_aligned(path2): |
|
153 |
cur_int1 = <int*>path1 |
|
154 |
cur_int2 = <int*>path2 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
155 |
end_int1 = <int*>(path1 + size1 - (size1 % sizeof(int))) |
156 |
end_int2 = <int*>(path2 + size2 - (size2 % sizeof(int))) |
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
157 |
|
158 |
while cur_int1 < end_int1 and cur_int2 < end_int2: |
|
159 |
if cur_int1[0] != cur_int2[0]: |
|
160 |
break
|
|
161 |
cur_int1 = cur_int1 + 1 |
|
162 |
cur_int2 = cur_int2 + 1 |
|
163 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
164 |
cur1 = <unsigned char*>cur_int1 |
165 |
cur2 = <unsigned char*>cur_int2 |
|
2474.1.69
by John Arbash Meinel
Thanks to Jan 'RedBully' Seiffert, some review cleanups |
166 |
else: |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
167 |
cur1 = <unsigned char*>path1 |
168 |
cur2 = <unsigned char*>path2 |
|
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
169 |
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
170 |
while cur1 < end1 and cur2 < end2: |
171 |
if cur1[0] == cur2[0]: |
|
172 |
# This character matches, just go to the next one
|
|
173 |
cur1 = cur1 + 1 |
|
174 |
cur2 = cur2 + 1 |
|
175 |
continue
|
|
176 |
# The current characters do not match
|
|
177 |
if cur1[0] == c'/': |
|
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
178 |
return -1 # Reached the end of path1 segment first |
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
179 |
elif cur2[0] == c'/': |
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
180 |
return 1 # Reached the end of path2 segment first |
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
181 |
elif cur1[0] < cur2[0]: |
182 |
return -1 |
|
183 |
else: |
|
184 |
return 1 |
|
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
185 |
|
186 |
# We reached the end of at least one of the strings
|
|
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
187 |
if cur1 < end1: |
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
188 |
return 1 # Not at the end of cur1, must be at the end of cur2 |
2474.1.18
by John Arbash Meinel
Add an integer-size comparison loop at the begining, and |
189 |
if cur2 < end2: |
2474.1.19
by John Arbash Meinel
Clean up _cmp_dirblock_strings_alt to make it the default. |
190 |
return -1 # At the end of cur1, but not at cur2 |
2474.1.17
by John Arbash Meinel
Using a custom loop seems to be the same speed, but is probably |
191 |
# We reached the end of both strings
|
192 |
return 0 |
|
193 |
||
194 |
||
2474.1.47
by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py |
195 |
def cmp_by_dirs_c(path1, path2): |
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
196 |
"""Compare two paths directory by directory.
|
197 |
||
198 |
This is equivalent to doing::
|
|
199 |
||
200 |
cmp(path1.split('/'), path2.split('/'))
|
|
201 |
||
202 |
The idea is that you should compare path components separately. This
|
|
203 |
differs from plain ``cmp(path1, path2)`` for paths like ``'a-b'`` and
|
|
204 |
``a/b``. "a-b" comes after "a" but would come before "a/b" lexically.
|
|
205 |
||
206 |
:param path1: first path
|
|
207 |
:param path2: second path
|
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
208 |
:return: negative number if ``path1`` comes first,
|
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
209 |
0 if paths are equal,
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
210 |
and positive number if ``path2`` sorts first
|
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
211 |
"""
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
212 |
if not PyString_CheckExact(path1): |
213 |
raise TypeError("'path1' must be a plain string, not %s: %r" |
|
214 |
% (type(path1), path1)) |
|
215 |
if not PyString_CheckExact(path2): |
|
216 |
raise TypeError("'path2' must be a plain string, not %s: %r" |
|
217 |
% (type(path2), path2)) |
|
2474.1.41
by John Arbash Meinel
Change the name of cmp_dirblock_strings to cmp_by_dirs |
218 |
return _cmp_by_dirs(PyString_AsString(path1), |
219 |
PyString_Size(path1), |
|
220 |
PyString_AsString(path2), |
|
221 |
PyString_Size(path2)) |
|
2474.1.13
by John Arbash Meinel
Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings. |
222 |
|
223 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
224 |
def _cmp_path_by_dirblock_c(path1, path2): |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
225 |
"""Compare two paths based on what directory they are in.
|
226 |
||
227 |
This generates a sort order, such that all children of a directory are
|
|
228 |
sorted together, and grandchildren are in the same order as the
|
|
229 |
children appear. But all grandchildren come after all children.
|
|
230 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
231 |
In other words, all entries in a directory are sorted together, and
|
232 |
directorys are sorted in cmp_by_dirs order.
|
|
233 |
||
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
234 |
:param path1: first path
|
235 |
:param path2: the second path
|
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
236 |
:return: negative number if ``path1`` comes first,
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
237 |
0 if paths are equal
|
2872.4.10
by Martin Pool
docstrings for cmp_ functions seem to be backwards |
238 |
and a positive number if ``path2`` sorts first
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
239 |
"""
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
240 |
if not PyString_CheckExact(path1): |
241 |
raise TypeError("'path1' must be a plain string, not %s: %r" |
|
242 |
% (type(path1), path1)) |
|
243 |
if not PyString_CheckExact(path2): |
|
244 |
raise TypeError("'path2' must be a plain string, not %s: %r" |
|
245 |
% (type(path2), path2)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
246 |
return _cmp_path_by_dirblock(PyString_AsString(path1), |
247 |
PyString_Size(path1), |
|
248 |
PyString_AsString(path2), |
|
249 |
PyString_Size(path2)) |
|
250 |
||
251 |
||
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
252 |
cdef int _cmp_path_by_dirblock(char *path1, int path1_len, |
253 |
char *path2, int path2_len): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
254 |
"""Compare two paths by what directory they are in.
|
255 |
||
256 |
see ``_cmp_path_by_dirblock_c`` for details.
|
|
257 |
"""
|
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
258 |
cdef char *dirname1 |
259 |
cdef int dirname1_len |
|
260 |
cdef char *dirname2 |
|
261 |
cdef int dirname2_len |
|
262 |
cdef char *basename1 |
|
263 |
cdef int basename1_len |
|
264 |
cdef char *basename2 |
|
265 |
cdef int basename2_len |
|
266 |
cdef int cur_len |
|
267 |
cdef int cmp_val |
|
268 |
||
269 |
if path1_len == 0 and path2_len == 0: |
|
270 |
return 0 |
|
271 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
272 |
if path1 == path2 and path1_len == path2_len: |
273 |
return 0 |
|
274 |
||
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
275 |
if path1_len == 0: |
276 |
return -1 |
|
277 |
||
278 |
if path2_len == 0: |
|
279 |
return 1 |
|
280 |
||
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
281 |
basename1 = <char*>_my_memrchr(path1, c'/', path1_len) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
282 |
|
283 |
if basename1 == NULL: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
284 |
basename1 = path1 |
285 |
basename1_len = path1_len |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
286 |
dirname1 = '' |
287 |
dirname1_len = 0 |
|
288 |
else: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
289 |
dirname1 = path1 |
290 |
dirname1_len = basename1 - path1 |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
291 |
basename1 = basename1 + 1 |
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
292 |
basename1_len = path1_len - dirname1_len - 1 |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
293 |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
294 |
basename2 = <char*>_my_memrchr(path2, c'/', path2_len) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
295 |
|
296 |
if basename2 == NULL: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
297 |
basename2 = path2 |
298 |
basename2_len = path2_len |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
299 |
dirname2 = '' |
300 |
dirname2_len = 0 |
|
301 |
else: |
|
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
302 |
dirname2 = path2 |
303 |
dirname2_len = basename2 - path2 |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
304 |
basename2 = basename2 + 1 |
2474.1.59
by John Arbash Meinel
Make sure to set basename_len. With that patch, the tests pass. |
305 |
basename2_len = path2_len - dirname2_len - 1 |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
306 |
|
307 |
cmp_val = _cmp_by_dirs(dirname1, dirname1_len, |
|
308 |
dirname2, dirname2_len) |
|
309 |
if cmp_val != 0: |
|
310 |
return cmp_val |
|
311 |
||
312 |
cur_len = basename1_len |
|
313 |
if basename2_len < basename1_len: |
|
314 |
cur_len = basename2_len |
|
315 |
||
316 |
cmp_val = memcmp(basename1, basename2, cur_len) |
|
317 |
if cmp_val != 0: |
|
318 |
return cmp_val |
|
319 |
if basename1_len == basename2_len: |
|
320 |
return 0 |
|
321 |
if basename1_len < basename2_len: |
|
322 |
return -1 |
|
323 |
return 1 |
|
324 |
||
325 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
326 |
def _bisect_path_left_c(paths, path): |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
327 |
"""Return the index where to insert path into paths.
|
328 |
||
329 |
This uses a path-wise comparison so we get::
|
|
330 |
a
|
|
331 |
a-b
|
|
332 |
a=b
|
|
333 |
a/b
|
|
334 |
Rather than::
|
|
335 |
a
|
|
336 |
a-b
|
|
337 |
a/b
|
|
338 |
a=b
|
|
339 |
:param paths: A list of paths to search through
|
|
340 |
:param path: A single path to insert
|
|
341 |
:return: An offset where 'path' can be inserted.
|
|
342 |
:seealso: bisect.bisect_left
|
|
343 |
"""
|
|
344 |
cdef int _lo |
|
345 |
cdef int _hi |
|
346 |
cdef int _mid |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
347 |
cdef char *path_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
348 |
cdef int path_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
349 |
cdef char *cur_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
350 |
cdef int cur_size |
351 |
cdef void *cur |
|
352 |
||
353 |
if not PyList_CheckExact(paths): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
354 |
raise TypeError("you must pass a python list for 'paths' not: %s %r" |
355 |
% (type(paths), paths)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
356 |
if not PyString_CheckExact(path): |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
357 |
raise TypeError("you must pass a string for 'path' not: %s %r" |
358 |
% (type(path), path)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
359 |
|
360 |
_hi = len(paths) |
|
361 |
_lo = 0 |
|
362 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
363 |
path_cstr = PyString_AsString(path) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
364 |
path_size = PyString_Size(path) |
365 |
||
366 |
while _lo < _hi: |
|
367 |
_mid = (_lo + _hi) / 2 |
|
368 |
cur = PyList_GetItem_object_void(paths, _mid) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
369 |
cur_cstr = PyString_AS_STRING_void(cur) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
370 |
cur_size = PyString_GET_SIZE_void(cur) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
371 |
if _cmp_path_by_dirblock(cur_cstr, cur_size, path_cstr, path_size) < 0: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
372 |
_lo = _mid + 1 |
373 |
else: |
|
374 |
_hi = _mid |
|
375 |
return _lo |
|
376 |
||
377 |
||
2474.1.66
by John Arbash Meinel
Some restructuring. |
378 |
def _bisect_path_right_c(paths, path): |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
379 |
"""Return the index where to insert path into paths.
|
380 |
||
381 |
This uses a path-wise comparison so we get::
|
|
382 |
a
|
|
383 |
a-b
|
|
384 |
a=b
|
|
385 |
a/b
|
|
386 |
Rather than::
|
|
387 |
a
|
|
388 |
a-b
|
|
389 |
a/b
|
|
390 |
a=b
|
|
391 |
:param paths: A list of paths to search through
|
|
392 |
:param path: A single path to insert
|
|
393 |
:return: An offset where 'path' can be inserted.
|
|
394 |
:seealso: bisect.bisect_right
|
|
395 |
"""
|
|
396 |
cdef int _lo |
|
397 |
cdef int _hi |
|
398 |
cdef int _mid |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
399 |
cdef char *path_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
400 |
cdef int path_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
401 |
cdef char *cur_cstr |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
402 |
cdef int cur_size |
403 |
cdef void *cur |
|
404 |
||
405 |
if not PyList_CheckExact(paths): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
406 |
raise TypeError("you must pass a python list for 'paths' not: %s %r" |
407 |
% (type(paths), paths)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
408 |
if not PyString_CheckExact(path): |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
409 |
raise TypeError("you must pass a string for 'path' not: %s %r" |
410 |
% (type(path), path)) |
|
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
411 |
|
412 |
_hi = len(paths) |
|
413 |
_lo = 0 |
|
414 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
415 |
path_cstr = PyString_AsString(path) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
416 |
path_size = PyString_Size(path) |
417 |
||
418 |
while _lo < _hi: |
|
419 |
_mid = (_lo + _hi) / 2 |
|
420 |
cur = PyList_GetItem_object_void(paths, _mid) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
421 |
cur_cstr = PyString_AS_STRING_void(cur) |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
422 |
cur_size = PyString_GET_SIZE_void(cur) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
423 |
if _cmp_path_by_dirblock(path_cstr, path_size, cur_cstr, cur_size) < 0: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
424 |
_hi = _mid |
425 |
else: |
|
426 |
_lo = _mid + 1 |
|
427 |
return _lo |
|
428 |
||
429 |
||
2474.1.47
by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py |
430 |
def bisect_dirblock_c(dirblocks, dirname, lo=0, hi=None, cache=None): |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
431 |
"""Return the index where to insert dirname into the dirblocks.
|
432 |
||
433 |
The return value idx is such that all directories blocks in dirblock[:idx]
|
|
434 |
have names < dirname, and all blocks in dirblock[idx:] have names >=
|
|
435 |
dirname.
|
|
436 |
||
437 |
Optional args lo (default 0) and hi (default len(dirblocks)) bound the
|
|
438 |
slice of a to be searched.
|
|
439 |
"""
|
|
440 |
cdef int _lo |
|
441 |
cdef int _hi |
|
442 |
cdef int _mid |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
443 |
cdef char *dirname_cstr |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
444 |
cdef int dirname_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
445 |
cdef char *cur_cstr |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
446 |
cdef int cur_size |
2474.1.12
by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables. |
447 |
cdef void *cur |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
448 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
449 |
if not PyList_CheckExact(dirblocks): |
450 |
raise TypeError("you must pass a python list for 'dirblocks' not: %s %r" |
|
451 |
% (type(dirblocks), dirblocks)) |
|
452 |
if not PyString_CheckExact(dirname): |
|
453 |
raise TypeError("you must pass a string for dirname not: %s %r" |
|
454 |
% (type(dirname), dirname)) |
|
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
455 |
if hi is None: |
456 |
_hi = len(dirblocks) |
|
457 |
else: |
|
458 |
_hi = hi |
|
459 |
||
460 |
_lo = lo |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
461 |
dirname_cstr = PyString_AsString(dirname) |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
462 |
dirname_size = PyString_Size(dirname) |
463 |
||
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
464 |
while _lo < _hi: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
465 |
_mid = (_lo + _hi) / 2 |
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
466 |
# Grab the dirname for the current dirblock
|
2474.1.12
by John Arbash Meinel
Clean up bisect_dirstate to not use temporary variables. |
467 |
# cur = dirblocks[_mid][0]
|
468 |
cur = PyTuple_GetItem_void_void( |
|
469 |
PyList_GetItem_object_void(dirblocks, _mid), 0) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
470 |
cur_cstr = PyString_AS_STRING_void(cur) |
2474.1.16
by John Arbash Meinel
Shave off maybe 10% by using the PyString_* macros instead of functions. |
471 |
cur_size = PyString_GET_SIZE_void(cur) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
472 |
if _cmp_by_dirs(cur_cstr, cur_size, dirname_cstr, dirname_size) < 0: |
2474.1.58
by John Arbash Meinel
(broken) Try to properly implement DirState._bisect* |
473 |
_lo = _mid + 1 |
2474.1.14
by John Arbash Meinel
Switching bisect_dirblocks remove the extra .split('/') |
474 |
else: |
475 |
_hi = _mid |
|
2474.1.4
by John Arbash Meinel
Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex. |
476 |
return _lo |
477 |
||
478 |
||
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
479 |
cdef class Reader: |
480 |
"""Maintain the current location, and return fields as you parse them."""
|
|
481 |
||
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
482 |
cdef object state # The DirState object |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
483 |
cdef object text # The overall string object |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
484 |
cdef char *text_cstr # Pointer to the beginning of text |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
485 |
cdef int text_size # Length of text |
486 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
487 |
cdef char *end_cstr # End of text |
488 |
cdef char *cur_cstr # Pointer to the current record |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
489 |
cdef char *next # Pointer to the end of this record |
490 |
||
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
491 |
def __init__(self, text, state): |
492 |
self.state = state |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
493 |
self.text = text |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
494 |
self.text_cstr = PyString_AsString(text) |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
495 |
self.text_size = PyString_Size(text) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
496 |
self.end_cstr = self.text_cstr + self.text_size |
497 |
self.cur_cstr = self.text_cstr |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
498 |
|
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
499 |
cdef char *get_next(self, int *size) except NULL: |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
500 |
"""Return a pointer to the start of the next field."""
|
501 |
cdef char *next |
|
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
502 |
cdef Py_ssize_t extra_len |
503 |
||
504 |
if self.cur_cstr == NULL: |
|
505 |
raise AssertionError('get_next() called when cur_str is NULL') |
|
506 |
elif self.cur_cstr >= self.end_cstr: |
|
507 |
raise AssertionError('get_next() called when there are no chars' |
|
508 |
' left') |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
509 |
next = self.cur_cstr |
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
510 |
self.cur_cstr = <char*>memchr(next, c'\0', self.end_cstr - next) |
511 |
if self.cur_cstr == NULL: |
|
512 |
extra_len = self.end_cstr - next |
|
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
513 |
raise errors.DirstateCorrupt(self.state, |
514 |
'failed to find trailing NULL (\\0).' |
|
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
515 |
' Trailing garbage: %r' |
516 |
% safe_string_from_size(next, extra_len)) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
517 |
size[0] = self.cur_cstr - next |
518 |
self.cur_cstr = self.cur_cstr + 1 |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
519 |
return next |
520 |
||
2474.1.53
by John Arbash Meinel
Changing Reader.get_next_str (which returns a Python String) |
521 |
cdef object get_next_str(self): |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
522 |
"""Get the next field as a Python string."""
|
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
523 |
cdef int size |
524 |
cdef char *next |
|
525 |
next = self.get_next(&size) |
|
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
526 |
return safe_string_from_size(next, size) |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
527 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
528 |
cdef int _init(self) except -1: |
529 |
"""Get the pointer ready.
|
|
530 |
||
531 |
This assumes that the dirstate header has already been read, and we
|
|
532 |
already have the dirblock string loaded into memory.
|
|
533 |
This just initializes our memory pointers, etc for parsing of the
|
|
534 |
dirblock string.
|
|
535 |
"""
|
|
2474.1.32
by John Arbash Meinel
Skip past the first entry while reading, |
536 |
cdef char *first |
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
537 |
cdef int size |
2474.1.32
by John Arbash Meinel
Skip past the first entry while reading, |
538 |
# The first field should be an empty string left over from the Header
|
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
539 |
first = self.get_next(&size) |
540 |
if first[0] != c'\0' and size == 0: |
|
2474.1.32
by John Arbash Meinel
Skip past the first entry while reading, |
541 |
raise AssertionError('First character should be null not: %s' |
542 |
% (first,)) |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
543 |
return 0 |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
544 |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
545 |
cdef object _get_entry(self, int num_trees, void **p_current_dirname, |
546 |
int *new_block): |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
547 |
"""Extract the next entry.
|
548 |
||
549 |
This parses the next entry based on the current location in
|
|
550 |
``self.cur_cstr``.
|
|
551 |
Each entry can be considered a "row" in the total table. And each row
|
|
552 |
has a fixed number of columns. It is generally broken up into "key"
|
|
553 |
columns, then "current" columns, and then "parent" columns.
|
|
554 |
||
555 |
:param num_trees: How many parent trees need to be parsed
|
|
556 |
:param p_current_dirname: A pointer to the current PyString
|
|
557 |
representing the directory name.
|
|
558 |
We pass this in as a void * so that pyrex doesn't have to
|
|
559 |
increment/decrement the PyObject reference counter for each
|
|
560 |
_get_entry call.
|
|
561 |
We use a pointer so that _get_entry can update it with the new
|
|
562 |
value.
|
|
563 |
:param new_block: This is to let the caller know that it needs to
|
|
564 |
create a new directory block to store the next entry.
|
|
565 |
"""
|
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
566 |
cdef object path_name_file_id_key |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
567 |
cdef char *entry_size_cstr |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
568 |
cdef unsigned long int entry_size |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
569 |
cdef char* executable_cstr |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
570 |
cdef int is_executable |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
571 |
cdef char* dirname_cstr |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
572 |
cdef char* trailing |
573 |
cdef int cur_size |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
574 |
cdef int i |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
575 |
cdef object minikind |
576 |
cdef object fingerprint |
|
577 |
cdef object info |
|
578 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
579 |
# Read the 'key' information (dirname, name, file_id)
|
580 |
dirname_cstr = self.get_next(&cur_size) |
|
581 |
# Check to see if we have started a new directory block.
|
|
582 |
# If so, then we need to create a new dirname PyString, so that it can
|
|
583 |
# be used in all of the tuples. This saves time and memory, by re-using
|
|
584 |
# the same object repeatedly.
|
|
585 |
||
586 |
# Do the cheap 'length of string' check first. If the string is a
|
|
587 |
# different length, then we *have* to be a different directory.
|
|
588 |
if (cur_size != PyString_GET_SIZE_void(p_current_dirname[0]) |
|
589 |
or strncmp(dirname_cstr, |
|
590 |
# Extract the char* from our current dirname string. We
|
|
591 |
# know it is a PyString, so we can use
|
|
592 |
# PyString_AS_STRING, we use the _void version because
|
|
593 |
# we are tricking Pyrex by using a void* rather than an
|
|
594 |
# <object>
|
|
595 |
PyString_AS_STRING_void(p_current_dirname[0]), |
|
596 |
cur_size+1) != 0): |
|
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
597 |
dirname = safe_string_from_size(dirname_cstr, cur_size) |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
598 |
p_current_dirname[0] = <void*>dirname |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
599 |
new_block[0] = 1 |
600 |
else: |
|
601 |
new_block[0] = 0 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
602 |
|
603 |
# Build up the key that will be used.
|
|
604 |
# By using <object>(void *) Pyrex will automatically handle the
|
|
605 |
# Py_INCREF that we need.
|
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
606 |
path_name_file_id_key = (<object>p_current_dirname[0], |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
607 |
self.get_next_str(), |
608 |
self.get_next_str(), |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
609 |
)
|
610 |
||
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
611 |
# Parse all of the per-tree information. current has the information in
|
612 |
# the same location as parent trees. The only difference is that 'info'
|
|
613 |
# is a 'packed_stat' for current, while it is a 'revision_id' for
|
|
614 |
# parent trees.
|
|
615 |
# minikind, fingerprint, and info will be returned as regular python
|
|
616 |
# strings
|
|
617 |
# entry_size and is_executable will be parsed into a python Long and
|
|
618 |
# python Boolean, respectively.
|
|
619 |
# TODO: jam 20070718 Consider changin the entry_size conversion to
|
|
620 |
# prefer python Int when possible. They are generally faster to
|
|
621 |
# work with, and it will be rare that we have a file >2GB.
|
|
622 |
# Especially since this code is pretty much fixed at a max of
|
|
623 |
# 4GB.
|
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
624 |
trees = [] |
625 |
for i from 0 <= i < num_trees: |
|
626 |
minikind = self.get_next_str() |
|
627 |
fingerprint = self.get_next_str() |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
628 |
entry_size_cstr = self.get_next(&cur_size) |
629 |
entry_size = strtoul(entry_size_cstr, NULL, 10) |
|
630 |
executable_cstr = self.get_next(&cur_size) |
|
631 |
is_executable = (executable_cstr[0] == c'y') |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
632 |
info = self.get_next_str() |
633 |
PyList_Append(trees, ( |
|
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
634 |
minikind, # minikind |
635 |
fingerprint, # fingerprint |
|
636 |
entry_size, # size |
|
637 |
is_executable,# executable |
|
638 |
info, # packed_stat or revision_id |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
639 |
))
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
640 |
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
641 |
# The returned tuple is (key, [trees])
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
642 |
ret = (path_name_file_id_key, trees) |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
643 |
# Ignore the trailing newline, but assert that it does exist, this
|
644 |
# ensures that we always finish parsing a line on an end-of-entry
|
|
645 |
# marker.
|
|
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
646 |
trailing = self.get_next(&cur_size) |
647 |
if cur_size != 1 or trailing[0] != c'\n': |
|
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
648 |
raise errors.DirstateCorrupt(self.state, |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
649 |
'Bad parse, we expected to end on \\n, not: %d %s: %s' |
3640.2.1
by John Arbash Meinel
More safety checks around PyString_FromStringAndSize, |
650 |
% (cur_size, safe_string_from_size(trailing, cur_size), |
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
651 |
ret)) |
2474.1.38
by John Arbash Meinel
Finally, faster than text.split() (156ms) |
652 |
return ret |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
653 |
|
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
654 |
def _parse_dirblocks(self): |
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
655 |
"""Parse all dirblocks in the state file."""
|
656 |
cdef int num_trees |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
657 |
cdef object current_block |
658 |
cdef object entry |
|
659 |
cdef void * current_dirname |
|
660 |
cdef int new_block |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
661 |
cdef int expected_entry_count |
662 |
cdef int entry_count |
|
663 |
||
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
664 |
num_trees = self.state._num_present_parents() + 1 |
665 |
expected_entry_count = self.state._num_entries |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
666 |
|
667 |
# Ignore the first record
|
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
668 |
self._init() |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
669 |
|
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
670 |
current_block = [] |
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
671 |
dirblocks = [('', current_block), ('', [])] |
672 |
self.state._dirblocks = dirblocks |
|
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
673 |
obj = '' |
2474.1.37
by John Arbash Meinel
get_next() returns the length of the string, |
674 |
current_dirname = <void*>obj |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
675 |
new_block = 0 |
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
676 |
entry_count = 0 |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
677 |
|
2474.1.54
by John Arbash Meinel
Optimize the simple case that the strings are the same object. |
678 |
# TODO: jam 2007-05-07 Consider pre-allocating some space for the
|
679 |
# members, and then growing and shrinking from there. If most
|
|
680 |
# directories have close to 10 entries in them, it would save a
|
|
681 |
# few mallocs if we default our list size to something
|
|
682 |
# reasonable. Or we could malloc it to something large (100 or
|
|
683 |
# so), and then truncate. That would give us a malloc + realloc,
|
|
684 |
# rather than lots of reallocs.
|
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
685 |
while self.cur_cstr < self.end_cstr: |
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
686 |
entry = self._get_entry(num_trees, ¤t_dirname, &new_block) |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
687 |
if new_block: |
688 |
# new block - different dirname
|
|
689 |
current_block = [] |
|
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
690 |
PyList_Append(dirblocks, |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
691 |
(<object>current_dirname, current_block)) |
692 |
PyList_Append(current_block, entry) |
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
693 |
entry_count = entry_count + 1 |
694 |
if entry_count != expected_entry_count: |
|
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
695 |
raise errors.DirstateCorrupt(self.state, |
696 |
'We read the wrong number of entries.'
|
|
2474.1.46
by John Arbash Meinel
Finish implementing _c_read_dirblocks for any number of parents. |
697 |
' We expected to read %s, but read %s' |
698 |
% (expected_entry_count, entry_count)) |
|
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
699 |
self.state._split_root_dirblock_into_contents() |
2474.1.36
by John Arbash Meinel
Move functions into member functions on reader() class. |
700 |
|
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
701 |
|
2474.1.47
by John Arbash Meinel
Change the names of the functions from c_foo and py_foo to foo_c and foo_py |
702 |
def _read_dirblocks_c(state): |
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
703 |
"""Read in the dirblocks for the given DirState object.
|
704 |
||
705 |
This is tightly bound to the DirState internal representation. It should be
|
|
706 |
thought of as a member function, which is only separated out so that we can
|
|
707 |
re-write it in pyrex.
|
|
708 |
||
709 |
:param state: A DirState object.
|
|
710 |
:return: None
|
|
2474.1.70
by John Arbash Meinel
Lot's of fixes from Martin's comments. |
711 |
:postcondition: The dirblocks will be loaded into the appropriate fields in
|
712 |
the DirState object.
|
|
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
713 |
"""
|
714 |
state._state_file.seek(state._end_of_header) |
|
715 |
text = state._state_file.read() |
|
716 |
# TODO: check the crc checksums. crc_measured = zlib.crc32(text)
|
|
717 |
||
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
718 |
reader = Reader(text, state) |
2474.1.30
by John Arbash Meinel
Start working towards a parser which uses a Reader (producer) |
719 |
|
3640.2.5
by John Arbash Meinel
Change from using AssertionError to using DirstateCorrupt in a few places |
720 |
reader._parse_dirblocks() |
2474.1.1
by John Arbash Meinel
Create a Pyrex extension for reading the dirstate file. |
721 |
state._dirblock_state = DirState.IN_MEMORY_UNMODIFIED |