我正在寻找最有效的方法来确定一个大数组是否包含至少一个非零值。乍一看np.any
似乎是这项工作的明显工具,但在大型阵列上它似乎出乎意料地慢。
考虑这种极端情况:
first = np.zeros(1E3,dtype=np.bool)
last = np.zeros(1E3,dtype=np.bool)
first[0] = True
last[-1] = True
# test 1
%timeit np.any(first)
>>> 100000 loops, best of 3: 6.36 us per loop
# test 2
%timeit np.any(last)
>>> 100000 loops, best of 3: 6.95 us per loop
至少np.any
似乎在这里做了一些模糊的事情 - 如果非零值是数组中的第一个值,那么在返回之前应该不需要考虑任何其他值True
,所以我希望测试 1 比测试 2 稍微快一些。
但是,当我们使数组更大时会发生什么?
first = np.zeros(1E9,dtype=np.bool)
last = np.zeros(1E9,dtype=np.bool)
first[0] = True
last[-1] = True
# test 3
%timeit np.any(first)
>>> 10 loops, best of 3: 21.6 ms per loop
# test 4
%timeit np.any(last)
>>> 1 loops, best of 3: 739 ms per loop
正如预期的那样,测试 4 比测试 3 慢很多。但是,在测试 3 中
np.any
,仍然只需要检查单个元素的值,
first
以便知道它包含至少一个非零值。那么,为什么测试 3 比测试 1 慢得多?
编辑1:
我使用的是 Numpy 的开发版本(1.8.0.dev-e11cd9b),但我使用 Numpy 1.7.1 得到了完全相同的计时结果。我正在运行 64 位 Linux,Python 2.7.4。我的系统基本上处于闲置状态(我正在运行一个 IPython 会话、一个浏览器和一个文本编辑器),而且我绝对不会进行交换。我还在另一台运行 Numpy 1.7.1 的机器上复制了结果。
编辑2:
使用 Numpy 1.6.2,我在测试 1 和 3 中都得到了 ~1.85us 的时间,所以正如 jorgeca 所说,在这方面 Numpy 1.6.2 和1.7.1 1.7.0之间似乎存在一些性能回归。
编辑3:
在 JF Sebastian 和 jorgeca 的带领下,我使用np.all
零数组进行了更多基准测试,这应该等同于调用np.any
第一个元素为 1 的数组。
测试脚本:
import timeit
import numpy as np
print 'Numpy v%s' %np.version.full_version
stmt = "np.all(x)"
for ii in xrange(10):
setup = "import numpy as np; x = np.zeros(%d,dtype=np.bool)" %(10**ii)
timer = timeit.Timer(stmt,setup)
n,r = 1,3
t = np.min(timer.repeat(r,n))
while t < 0.2:
n *= 10
t = np.min(timer.repeat(r,n))
t /= n
if t < 1E-3:
timestr = "%1.3f us" %(t*1E6)
elif t < 1:
timestr = "%1.3f ms" %(t*1E3)
else:
timestr = "%1.3f s" %t
print "Array size: 1E%i, %i loops, best of %i: %s/loop" %(ii,n,r,timestr)
结果:
Numpy v1.6.2
Array size: 1E0, 1000000 loops, best of 3: 1.738 us/loop
Array size: 1E1, 1000000 loops, best of 3: 1.845 us/loop
Array size: 1E2, 1000000 loops, best of 3: 1.862 us/loop
Array size: 1E3, 1000000 loops, best of 3: 1.858 us/loop
Array size: 1E4, 1000000 loops, best of 3: 1.864 us/loop
Array size: 1E5, 1000000 loops, best of 3: 1.882 us/loop
Array size: 1E6, 1000000 loops, best of 3: 1.866 us/loop
Array size: 1E7, 1000000 loops, best of 3: 1.853 us/loop
Array size: 1E8, 1000000 loops, best of 3: 1.860 us/loop
Array size: 1E9, 1000000 loops, best of 3: 1.854 us/loop
Numpy v1.7.0
Array size: 1E0, 100000 loops, best of 3: 5.881 us/loop
Array size: 1E1, 100000 loops, best of 3: 5.831 us/loop
Array size: 1E2, 100000 loops, best of 3: 5.924 us/loop
Array size: 1E3, 100000 loops, best of 3: 5.864 us/loop
Array size: 1E4, 100000 loops, best of 3: 5.997 us/loop
Array size: 1E5, 100000 loops, best of 3: 6.979 us/loop
Array size: 1E6, 100000 loops, best of 3: 17.196 us/loop
Array size: 1E7, 10000 loops, best of 3: 116.162 us/loop
Array size: 1E8, 1000 loops, best of 3: 1.112 ms/loop
Array size: 1E9, 100 loops, best of 3: 11.061 ms/loop
Numpy v1.7.1
Array size: 1E0, 100000 loops, best of 3: 6.216 us/loop
Array size: 1E1, 100000 loops, best of 3: 6.257 us/loop
Array size: 1E2, 100000 loops, best of 3: 6.318 us/loop
Array size: 1E3, 100000 loops, best of 3: 6.247 us/loop
Array size: 1E4, 100000 loops, best of 3: 6.492 us/loop
Array size: 1E5, 100000 loops, best of 3: 7.406 us/loop
Array size: 1E6, 100000 loops, best of 3: 17.426 us/loop
Array size: 1E7, 10000 loops, best of 3: 115.946 us/loop
Array size: 1E8, 1000 loops, best of 3: 1.102 ms/loop
Array size: 1E9, 100 loops, best of 3: 10.987 ms/loop
Numpy v1.8.0.dev-e11cd9b
Array size: 1E0, 100000 loops, best of 3: 6.357 us/loop
Array size: 1E1, 100000 loops, best of 3: 6.399 us/loop
Array size: 1E2, 100000 loops, best of 3: 6.425 us/loop
Array size: 1E3, 100000 loops, best of 3: 6.397 us/loop
Array size: 1E4, 100000 loops, best of 3: 6.596 us/loop
Array size: 1E5, 100000 loops, best of 3: 7.569 us/loop
Array size: 1E6, 100000 loops, best of 3: 17.445 us/loop
Array size: 1E7, 10000 loops, best of 3: 115.109 us/loop
Array size: 1E8, 1000 loops, best of 3: 1.094 ms/loop
Array size: 1E9, 100 loops, best of 3: 10.840 ms/loop
编辑4:
在 seberg 的评论之后,我尝试了使用np.float32
数组而不是np.bool
. 在这种情况下,随着数组大小的增加,Numpy 1.6.2 也会出现放缓:
Numpy v1.6.2
Array size: 1E0, 100000 loops, best of 3: 3.503 us/loop
Array size: 1E1, 100000 loops, best of 3: 3.597 us/loop
Array size: 1E2, 100000 loops, best of 3: 3.742 us/loop
Array size: 1E3, 100000 loops, best of 3: 4.745 us/loop
Array size: 1E4, 100000 loops, best of 3: 14.533 us/loop
Array size: 1E5, 10000 loops, best of 3: 112.463 us/loop
Array size: 1E6, 1000 loops, best of 3: 1.101 ms/loop
Array size: 1E7, 100 loops, best of 3: 11.724 ms/loop
Array size: 1E8, 10 loops, best of 3: 116.924 ms/loop
Array size: 1E9, 1 loops, best of 3: 1.168 s/loop
Numpy v1.7.1
Array size: 1E0, 100000 loops, best of 3: 6.548 us/loop
Array size: 1E1, 100000 loops, best of 3: 6.546 us/loop
Array size: 1E2, 100000 loops, best of 3: 6.804 us/loop
Array size: 1E3, 100000 loops, best of 3: 7.784 us/loop
Array size: 1E4, 100000 loops, best of 3: 17.946 us/loop
Array size: 1E5, 10000 loops, best of 3: 117.235 us/loop
Array size: 1E6, 1000 loops, best of 3: 1.096 ms/loop
Array size: 1E7, 100 loops, best of 3: 12.328 ms/loop
Array size: 1E8, 10 loops, best of 3: 118.431 ms/loop
Array size: 1E9, 1 loops, best of 3: 1.172 s/loop
为什么会发生这种情况?与布尔情况一样,np.all
在返回之前仍然只需要检查第一个元素,因此时间应该仍然是恒定的 wrt 数组大小。