Removing the min(keys) and max(keys) calls saves 100ms in the inner loop (get_ancestry() over all of bzr.dev in a single index is 347ms => 245ms). The current breakdown is roughly: 0.4789 0.1127 bzrlib.btree_index:1129(get_ancestry) 0.0418 0.0418 +<method 'update' of 'set' objects> 0.0480 0.0325 +bzrlib.btree_index:966(_multi_bisect_right) 0.0274 0.0274 +<method 'difference' of 'set' objects> 0.0081 0.0081 +<method 'add' of 'set' objects> 0.0075 0.0075 +<sorted> 0.0048 0.0004 +bzrlib.btree_index:899(_get_internal_nodes) 0.2275 0.0004 +bzrlib.btree_index:917(_get_leaf_nodes) 0.0002 0.0002 +<method 'extend' of 'list' objects> 0.0009 0.0001 +bzrlib.btree_index:1375(key_count)
So we have a bit of just general overhead (112ms), and then 50ms spent in _multi_bisect_right, which we could move to a C extension. 50ms in set.update and 28ms in set.difference 227ms in reading and parsing the 222 nodes from disk. It seems a little unfortunate that parsing is the primary overhead, but previous investigation did not reveal much fat that could be trimmed.
It is 3.8MB of uncompressed data that is being parsed. That's got to take some amount of time. 200ms might be reasonable. Which would hint that the only way to speed it up would be: 1) a different format 2) don't read the whole thing, stupid :)