python - 使用 numpy 数组优化 python 函数

Question

我一直在尝试优化我过去两天编写的 python 脚本。使用几个分析工具（cProfile、line_profiler 等），我将问题缩小到以下函数。

df是一个具有 3 列和 +1,000,000 行的 numpy 数组（数据类型为浮点数）。使用 line_profiler，我发现该函数在需要访问 numpy 数组时花费大部分时间。

full_length += head + df[rnd_truck, 2]

和

full_weight += df[rnd_truck,1]

花费大部分时间，其次是

full_length = df[rnd_truck,2]

full_weight = df[rnd_truck,1]

线。

据我所知，瓶颈是由函数试图从 numpy 数组中获取一个数字的访问时间引起的。

当我运行该函数时，MonteCarlo(df, 15., 1000.)在具有 8GB RAM 的 i7 3.40GhZ 64 位 Windows 机器中调用该函数 1,000,000 次需要 37 秒。在我的应用程序中，我需要运行 1,000,000,000 次才能确保收敛，这会使执行时间超过一个小时。我尝试使用operator.add求和线的方法，但它根本没有帮助我。看来我必须想出一种更快的方法来访问这个 numpy 数组。

欢迎任何想法！

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]

    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]

    # Return average weight per feet on the bridge
    return(full_weight/span)

下面是df我正在使用的 numpy 数组的一部分：

In [31] df
Out[31]: 
array([[  12. ,  220.4,  108.4],
       [  11. ,  220.4,  106.2],
       [  11. ,  220.3,  113.6],
       ..., 
       [   4. ,   13.9,   36.8],
       [   3. ,   13.7,   33.9],
       [   3. ,   13.7,   10.7]])

score 3 · Accepted Answer

正如其他人所指出的，这根本不是矢量化的，所以你的缓慢实际上是由于 Python 解释器的缓慢。Cython可以通过最小的更改在这里为您提供很多帮助：

>>> %timeit MonteCarlo(df, 5, 1000)
10000 loops, best of 3: 48 us per loop

>>> %timeit MonteCarlo_cy(df, 5, 1000)
100000 loops, best of 3: 3.67 us per loop

MonteCarlo_cy只是（在 IPython 笔记本中，之后）%load_ext cythonmagic：

%%cython
import numpy as np
cimport numpy as np

def MonteCarlo_cy(double[:, ::1] df, double head, double span):
    # Pick initial truck
    cdef long n = df.shape[0]
    cdef long rnd_truck = np.random.randint(0, n)
    cdef double full_weight = df[rnd_truck, 1]
    cdef double full_length = df[rnd_truck, 2]

    # Loop using other random truck until the bridge is full
    while True:
        rnd_truck = np.random.randint(0, n)
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck, 1]

    # Return average weight per feet on the bridge
    return full_weight / span

score 2 · Accepted Answer

需要指出的是，蒙特卡洛是令人尴尬的平行。无论您选择哪种解决方案，您都应该做一些事情来并行化它。使用@Dougal 的答案。

from multiprocessing import Pool

def RunVMC(n):
    return MonteCarlo_cy(df,head,span)


pool=Pool(processes=4)

%timeit [MonteCarlo_cy(df,15,1000) for x in range(1000000)]
1 loops, best of 3: 3.89 s per loop

#Pool @ 4
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 0.973 s per loop

#Pool @ 8
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 568 ms per loop

score 2 · Accepted Answer

使用 cython 编译函数可以极大地改善运行时。

在一个名为“funcs.pyx”的单独文件中，我有以下代码：

cimport cython
import numpy as np
cimport numpy as np


def MonteCarlo(np.ndarray[np.float_t, ndim=2] df, float head, float span):
    # Pick initial truck
    cdef int rnd_truck = np.random.randint(0,len(df))
    cdef float full_length = df[rnd_truck,2]
    cdef float full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

除了变量前面的类型声明之外，一切都是一样的。

这是我用来测试它的文件：

import numpy as np
import pyximport
pyximport.install(reload_support=True, setup_args={'include_dirs':[np.get_include()]})
import funcs

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

df = np.random.rand(1000000,3)
reload(funcs)
%timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
%timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]

我只运行了 10000 次，但即便如此，还是有很大的改进。

16:42:30: In [31]: %timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
10 loops, best of 3: 131 ms per loop

16:42:37: In [32]: %timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]
1 loops, best of 3: 1.75 s per loop

score 0 · Accepted Answer

您可以尝试切换到另一个 Python 变体。Jython比 Python 快一点，在某些情况下PyPy快得多。试一试。

python - 使用 numpy 数组优化 python 函数

4 回答 4

Related

Reference