python - 添加多个相同形状的 pytables 数组的最快方法

Question

我有几个tables.carray相同形状的大型数据结构（300000x300000）。我想添加所有数据并将其存储在主矩阵中。

现在，我创建了一个新的数组并用一个简单的循环填充它：

shape = (300000,300000)
#... open all hdf5 files of the existing matrices and create a new one
matrix = h5f.createCArray( h5f.root, 'carray', atom, shape, filters=filters )

for i in range( shape[0] ):
  for j in range( shape[1] ):

    for m in single_matrices:

      # print 'reading', i,j,shape
      value = m[i, j]

      # print 'writing'
      matrix[i, j] += value

但它非常慢（> 12 小时）。有没有更好的办法？

score 0 · Accepted Answer

你真的应该使用 Expr() 类来评估这个 [1]。它在后台使用 numexpr 在块上并行计算所需的操作。使用 out 参数甚至会在计算时将结果写回磁盘。这确保了整个数组永远不会在内存中。

http://pytables.github.io/usersguide/libref/expr_class.html

python - 添加多个相同形状的 pytables 数组的最快方法

1 回答 1

Related

Reference