Using Key_New() rather than Key(PyTuple_New()) gets us to 720ms.
It takes some gymnastics to get it to work, because we have to export the
symbol and then find the .lib file to use it from the pyrex code.
setup.py syntax doesn't seem to support that cleanly :(
Going further, we'll have to update _flatten_node to support Key nodes,
since right now it has some explicit 'is tuple' checks that fail.