python - 为什么numpy比python慢？如何让代码性能更好

Question

我将我的神经网络从纯 python 改写为 numpy，但现在它的工作速度更慢。所以我尝试了这两个功能：

def d():
    a = [1,2,3,4,5]
    b = [10,20,30,40,50]
    c = [i*j for i,j in zip(a,b)]
    return c

def e():
    a = np.array([1,2,3,4,5])
    b = np.array([10,20,30,40,50])
    c = a*b
    return c

时间 d = 1.77135205057

时间 e = 17.2464673758

Numpy 慢了 10 倍。为什么会这样以及如何正确使用numpy？

score 14 · Accepted Answer

我会假设差异是因为您正在构建列表和数组，e而您只是在d. 考虑：

import numpy as np

def d():
    a = [1,2,3,4,5]
    b = [10,20,30,40,50]
    c = [i*j for i,j in zip(a,b)]
    return c

def e():
    a = np.array([1,2,3,4,5])
    b = np.array([10,20,30,40,50])
    c = a*b
    return c

#Warning:  Functions with mutable default arguments are below.
# This code is only for testing and would be bad practice in production!
def f(a=[1,2,3,4,5],b=[10,20,30,40,50]):
    c = [i*j for i,j in zip(a,b)]
    return c

def g(a=np.array([1,2,3,4,5]),b=np.array([10,20,30,40,50])):
    c = a*b
    return c


import timeit
print timeit.timeit('d()','from __main__ import d')
print timeit.timeit('e()','from __main__ import e')
print timeit.timeit('f()','from __main__ import f')
print timeit.timeit('g()','from __main__ import g')

这里的函数f和g避免每次都重新创建列表/数组，我们得到了非常相似的性能：

1.53083586693
15.8963699341
1.33564996719
1.69556999207

请注意， list-comp +zip仍然获胜。但是，如果我们使数组足够大，numpy 会胜出：

t1 = [1,2,3,4,5] * 100
t2 = [10,20,30,40,50] * 100
t3 = np.array(t1)
t4 = np.array(t2)
print timeit.timeit('f(t1,t2)','from __main__ import f,t1,t2',number=10000)
print timeit.timeit('g(t3,t4)','from __main__ import g,t3,t4',number=10000)

我的结果是：

0.602419137955
0.0263929367065

score 3 · Accepted Answer

import time , numpy
def d():
    a = range(100000)
    b =range(0,1000000,10)
    c = [i*j for i,j in zip(a,b)]
    return c

def e():
    a = numpy.array(range(100000))
    b =numpy.array(range(0,1000000,10))
    c = a*b
    return c



#python ['0.04s', '0.04s', '0.04s']
#numpy ['0.02s', '0.02s', '0.02s']

尝试使用更大的数组......即使创建数组的开销 numpy 也快得多

score 2 · Accepted Answer

Numpy 数据结构在添加/构建时速度较慢

这里有一些测试：

from timeit import Timer
setup1 = '''import numpy as np
a = np.array([])'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)

setup2 = 'l = list()'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)

print('appending to empty list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))

setup1 = '''import numpy as np
a = np.array(range(999999))'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)

setup2 = 'l = [x for x in xrange(999999)]'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)

print('appending to large list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))

结果：

appending to empty list:
[0.008171333983972538, 0.0076482562944814175, 0.007862921943675175]
[0.00015624398517267296, 0.0001191077336243837, 0.000118654852507942]
appending to large list:
[2.8521017080411304, 2.8518707386717446, 2.8022625940577477]
[0.0001643958452675065, 0.00017888804099541744, 0.00016711313196715594]

score -1 · Accepted Answer

我不认为 numpy 很慢，因为它必须考虑到编写和调试所需的时间。程序越长，就越难发现问题或添加新功能（程序员时间）。因此，使用更高级别的语言可以在同等的智能时间和技能下创建复杂且可能更高效的程序。

无论如何，一些有趣的优化工具是：

- Psyco是一种 JIT（及时，“实时”），它在运行时优化代码。

- Numexpr，并行化是加速程序执行的好方法，只要它是足够可分离的。

- weave是 NumPy 中的一个模块，用于沟通 Python 和 C。它的一个功能是 blitz，它需要一行 Python，透明地翻译 C，并且每次调用都执行优化版本。在进行第一次转换时需要大约一秒钟，但更高的速度通常可以满足上述所有要求。它不是 Numexpr 或 Psyco 字节码，也不是 NumPy 的接口 C，而是您自己的函数直接用 C 编写并完全编译和优化。

python - 为什么numpy比python慢​​？如何让代码性能更好

4 回答 4

Related

Reference

python - 为什么numpy比python慢？如何让代码性能更好