python - 是否有另一种矢量方式来编写 to_array 函数？

Question

假设我们有一个参差不齐的嵌套序列，如下所示：

import numpy as np
x = np.ones((10, 20))
y = np.zeros((10, 20))
a = [[0, x], [y, 1]]

并希望在必要时创建一个完整 numpy的数组来广播参差不齐的子序列（以匹配任何其他子序列的最大维度，在这种情况下(10,20)）。首先，我们可能会尝试使用np.array(a)，这会产生警告：

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

通过更改为np.array(a, dtype=object)，我们确实得到了一个数组。但是，这是一个对象数组而不是浮点数，并且保留了未按需要广播的参差不齐的子序列。为了解决这个问题，我创建了一个新函数to_array，它采用（可能是参差不齐的、嵌套的）序列和一个形状并返回该形状的完整 numpy 数组：

    def to_array(a, shape):
        a = np.array(a, dtype=object)
        b = np.empty(shape)
        for index in np.ndindex(a.shape):
            b[index] = a[index]
        return b
    
    b = np.array(a, dtype=object)
    c = to_array(a, (2, 2, 10, 20))
    
    print(b.shape, b.dtype) # prints (2, 2) object
    print(c.shape, c.dtype) # prints (2, 2, 10, 20) float64

请注意c，不是b，是期望的结果。但是，to_array依赖于nindex上的 for 循环，Python for 循环对于大数组来说很慢。

是否有替代的矢量化方法来编写to_array函数？

score 1 · Accepted Answer

给定目标形状，几次迭代似乎并不过分昂贵：

In [35]: C = np.empty((A.shape+x.shape), x.dtype)                                                    
In [36]: for idx in np.ndindex(A.shape): 
    ...:     C[idx] = A[idx] 
    ...:

或者，您可以将0and替换1为适当的 (10,20) 数组。在这里，您已经创建了这些，x并且y：

In [37]: D = np.array([[y,x],[y,x]])                                                                 
In [38]: np.allclose(C,D)                                                                            
Out[38]: True

一般来说，对复杂任务进行几次迭代是可以的。请记住，对象 dtype 数组上的（许多）操作实际上比等效列表上的操作慢。数字数组上的全数组编译操作相对较快。那不是你的情况。

但

C[0,0,:,:] = 0

使用广播 - 所有 (10,20) 的值都通过广播C[0,0]用标量填充。0

C[0,1,:,:] = x

是一个不同的广播，其中 RHS 匹配左侧。期望numpy用一个广播操作来处理这两种情况是不合理的。

python - 是否有另一种矢量方式来编写 to_array 函数？

1 回答 1

Related

Reference