python - 为什么 numpy 矢量化函数显然被称为额外时间？

Question

我有一个 numpy 对象数组，其中包含几个索引号列表：

>>> idxLsts = np.array([[1], [0, 2]], dtype=object)

我定义了一个向量化函数来为每个列表附加一个值：

>>> idx = 99  
>>> f = np.vectorize(lambda idxLst: idxLst.append(idx))

我调用该函数。我不关心返回值，只关心副作用。

>>> f(idxLsts)  
array([None, None], dtype=object)

索引 99 被两次添加到第一个列表中。为什么？我难住了。

>>> idxLsts
array([[1, 99, 99], [0, 2, 99]], dtype=object)

对于 idxLsts 的其他值，它不会发生：

>>> idxLsts = np.array([[1, 2], [0, 2, 4]], dtype=object)
>>> f(idxLsts)
array([None, None], dtype=object)
>>> idxLsts
array([[1, 2, 99], [0, 2, 4, 99]], dtype=object)

我的怀疑是它与文档有关：“定义一个向量化函数，它将嵌套的对象序列或 numpy 数组作为输入并返回一个 numpy 数组作为输出。向量化函数在输入数组的连续元组上评估 pyfunc，例如python map 函数，除了它使用 numpy 的广播规则。"

score 8 · Accepted Answer

从vectorize文档字符串：

The data type of the output of `vectorized` is determined by calling
the function with the first element of the input.  This can be avoided
by specifying the `otypes` argument.

从代码中：

        theout = self.thefunc(*newargs)

这是对的额外调用thefunc，用于确定输出类型。这就是为什么第一个元素要99附加两个 s 的原因。

这种行为也发生在您的第二种情况下：

import numpy as np
idxLsts = np.array([[1, 2], [0,2,4]], dtype = object)
idx = 99
f = np.vectorize(lambda x: x.append(idx))
f(idxLsts)
print(idxLsts)

产量

[[1, 2, 99, 99] [0, 2, 4, 99]]

您可以使用np.frompyfunc而不是np.vectorize：

import numpy as np
idxLsts = np.array([[1, 2], [0,2,4]], dtype = object)
idx = 99
f = np.frompyfunc(lambda x: x.append(idx), 1, 1)
f(idxLsts)
print(idxLsts)

产量

[[1, 2, 99] [0, 2, 4, 99]]

python - 为什么 numpy 矢量化函数显然被称为额外时间？

1 回答 1

Related

Reference