53
53
This helps us remain compatible to older versions of bzr. We may change
54
54
our policy in the future, though.
56
# TODO: jam 20060816 Benchmark this, is it better to use try/except or
57
# to use _map.get() and check for None.
58
# Or still further, it might be better to pre-generate all
59
# possible conversions. However, the occurance of unicode
60
# characters is quite low, so an initial guess is that this
61
# is the most efficient method
62
# Also need to benchmark whether it is better to have a regex
63
# which matches multiple characters, or if it is better to
64
# only match a single character and call this function multiple
65
# times. The chance that we actually need multiple escapes
66
# is probably very low for our expected usage
56
# jam 20060816 Benchmarks show that try/KeyError is faster if you
57
# expect the entity to rarely miss. There is about a 10% difference
58
# in overall time. But if you miss frequently, then if None is much
59
# faster. For our use case, we *rarely* have a revision id, file id
60
# or path name that is unicode. So use try/KeyError.
68
62
return _map[match.group()]
75
69
def _encode_and_escape(unicode_str, _map=_unicode_to_escaped_map):
76
70
"""Encode the string into utf8, and escape invalid XML characters"""
71
# We frequently get entities we have not seen before, so it is better
72
# to check if None, rather than try/KeyError
77
73
text = _map.get(unicode_str)
79
75
# The alternative policy is to do a regular UTF8 encoding