python - 为什么 numpy.sum 在添加生成器的元素时返回 float64 而不是 uint64？

Question

我刚刚遇到了这种奇怪的行为numpy.sum：

>>> import numpy
>>> ar = numpy.array([1,2,3], dtype=numpy.uint64)
>>> gen = (el for el in ar)
>>> lst = [el for el in ar]
>>> numpy.sum(gen)
6.0
>>> numpy.sum(lst)
6
>>> numpy.sum(iter(lst))
<listiterator object at 0x87d02cc>

根据文档，结果应该与dtype可迭代的相同，但是为什么在第一种情况下numpy.float64返回 a 而不是 a numpy.uint64？为什么最后一个示例没有返回任何类型的总和也没有引发任何错误？

score 6 · Accepted Answer

一般来说，numpy 函数在使用生成器时并不总是像您期望的那样。要创建一个 numpy 数组，您需要在创建它之前知道它的大小和类型，而这对于生成器来说是不可能的。如此多的 numpy 函数要么不适用于生成器，要么在依赖 Python 内置函数的情况下执行此类操作。

然而，出于同样的原因，在 Numpy 上下文中使用生成器通常不是很有用。从 Numpy 对象生成生成器并没有真正的优势，因为无论如何您都必须将整个 Numpy 对象保存在内存中。如果您需要所有类型都保持您指定的状态，那么您不应该将 Numpy 对象包装在生成器中。

更多信息：从技术上讲， to 的参数np.sum应该是“类似数组”的对象，而不是可迭代的。类数组在文档中定义为：

数组、任何公开数组接口的对象、其__array__方法返回数组的对象或任何（嵌套）序列。

阵列接口在此处记录。基本上，数组必须具有固定的形状和统一的类型。

Generators don't fit this protocol and so aren't really supported. Many numpy functions are nice and will accept other sorts of objects that don't technically qualify as array-like, but a strict reading of the docs implies you can't rely on this behavior. The operations may work, but you can't expect all the types to be preserved perfectly.

score 5 · Accepted Answer

如果参数是生成器，则sum使用 Python 的内置函数。

numpy.sum您可以在(numpy/core/fromnumeric.py)的源代码中看到这一点：

  0     if isinstance(a, _gentype):
  1         res = _sum_(a)
  2         if out is not None:
  3             out[...] = res
  4             return out
  5         return res

_gentype只是 , 的别名types.GeneratorType，并且_sum_是内置的sum.

如果你尝试申请sumand gen，lst你会看到结果是一样的：6.0。

的第二个参数sum是start，默认为0，这是使你的结果 a 的一部分float64。

In [1]: import numpy as np

In [2]: type(np.uint64(1) + np.uint64(2))
Out[2]: numpy.uint64

In [3]: type(np.uint64(1) + 0)
Out[3]: numpy.float64

编辑：顺便说一句，我找到了一张关于这个问题的票，它被标记为wontfix：http ://projects.scipy.org/numpy/ticket/669

python - 为什么 numpy.sum 在添加生成器的元素时返回 float64 而不是 uint64？

2 回答 2

Related

Reference