我假设您想要 unhexlify 数据,并将生成的字节串存储为固定长度的字符串,而不是object
. (您不能将它们存储为某种 int128 类型,因为 numpy 没有这种类型。)
为了避免将 3.2GB 的文本读入内存,并使用大致相同的数量将其预处理为所需的形式,您可能想要使用fromiter
,所以:
with open(myfile) as f:
iv = binascii.unhexlify(f.readline().strip())
key = binascii.unhexlify(f.readline().strip())
count = int(f.readline())
a = np.fromiter((binascii.unhexlify(line.strip()) for line in f), dtype='|S16')
如果您有 10GB 的 RAM 可用(粗略的猜测),将整个内容读取为of可能会更快,然后将其转换两次……但我对此表示怀疑。array
object
As to whether this will help… You might get a little benefit, because AES-ing 16 bytes may be fast enough that the cost of iteration is noticeable. Let's test it and see.
With 64-bit Mac Python 2.7.2, I created an array of 100000 S16s by copying your example repeatedly. Then:
In [514]: %timeit [aes.encrypt(x) for x in a]
10 loops, best of 3: 166 ms per loop
In [515]: %timeit np.vectorize(aes.encrypt)(a)
10 loops, best of 3: 126 ms per loop
So, that's almost a 25% savings. Not bad.
Of course the array takes longer to build than just keeping things in an iterator in the first place—but even taking that into account, there's still a 9% performance gain. And it may well be reasonable to trade 1.6GB for a 9% speedup in your use case.
Keep in mind that I'm only building an array of 100K objects out of a pre-existing 100K list; with 100M objects read off disk, I/O is probably going to become a serious factor, and it's quite possible that iterative processing (which allows you to interleave the CPU costs with disk waits) will do a whole lot better.
In other words, you need to test with your own real data and scenario. But you already knew that.
For a wider variety of implementations, with a simple perf-testing scaffolding, see this pastebin.
You might want to try combining different approaches. For example, you can use the grouper
recipe from itertools
to batch things into, say, 32K plaintexts at a time, then process each batch with numpy, to get the best of both. And then pool.imap
that numpy processing, to get the best of all 3. Or, alternatively, put the one big numpy
array into shared memory, and make each multiprocessing task process a slice of that array.