python - 与 CPython 相比，Numba 和 Cython 并没有显着提高性能，也许我使用不正确？

Question

大编辑：

=================

为了清楚起见，我将删除旧结果并用更新的结果替换它。问题还是一样：我是否正确使用了 Cython 和 Numba，可以对代码进行哪些改进？（我有一个更新、更简单的临时 IPython 笔记本，这里有所有代码和结果）

1)

我想我明白了为什么 Cython、Numba 和 CPython 之间最初没有区别：这是因为我喂了它们

numpy 数组作为输入：

x = np.asarray([x_i*np.random.randint(8,12)/10 for x_i in range(n)])

而不是列表：

x = [x_i*random.randint(8,12)/10 for x_i in range(n)]

使用 Numpy 数组作为数据输入的基准测试

在此处输入图像描述

使用 Python 列表作为输入进行基准测试

在此处输入图像描述

2)

我用zip()显式循环替换了该函数，但是，它并没有太大的区别。代码将是：

CPython

def py_lstsqr(x, y):
    """ Computes the least-squares solution to a linear matrix equation. """
    len_x = len(x)
    x_avg = sum(x)/len_x
    y_avg = sum(y)/len(y)
    var_x = 0
    cov_xy = 0
    for i in range(len_x):
        temp = (x[i] - x_avg)
        var_x += temp**2
        cov_xy += temp*(y[i] - y_avg)
    slope = cov_xy / var_x
    y_interc = y_avg - slope*x_avg
    return (slope, y_interc)

赛通

%load_ext cythonmagic

%%cython
def cy_lstsqr(x, y):
    """ Computes the least-squares solution to a linear matrix equation. """
    cdef double x_avg, y_avg, var_x, cov_xy,\
         slope, y_interc, x_i, y_i
    cdef int len_x
    len_x = len(x)
    x_avg = sum(x)/len_x
    y_avg = sum(y)/len(y)
    var_x = 0
    cov_xy = 0
    for i in range(len_x):
        temp = (x[i] - x_avg)
        var_x += temp**2
        cov_xy += temp*(y[i] - y_avg)
    slope = cov_xy / var_x
    y_interc = y_avg - slope*x_avg
    return (slope, y_interc)

努巴

from numba import jit

@jit
def numba_lstsqr(x, y):
    """ Computes the least-squares solution to a linear matrix equation. """
    len_x = len(x)
    x_avg = sum(x)/len_x
    y_avg = sum(y)/len(y)
    var_x = 0
    cov_xy = 0
    for i in range(len_x):
        temp = (x[i] - x_avg)
        var_x += temp**2
        cov_xy += temp*(y[i] - y_avg)
    slope = cov_xy / var_x
    y_interc = y_avg - slope*x_avg
    return (slope, y_interc)

score 2 · Accepted Answer

以下是我认为 Numba 正在发生的事情：

Numba 适用于Numpy数组。没有其他的。其他一切都与Numba.

zip返回 Numba 无法看到的任意项的迭代器。因此，Numba 无法进行太多编译。

用 a 循环索引for i in range(...)可能会产生更好的结果并允许更强的类型推断。

score 1 · Accepted Answer

使用内置 sum() 可能会导致问题。

这是在 Numba 中运行得更快的线性回归代码：

@numba.jit
def ols(x, y):
    """Simple OLS for two data sets."""
    M = x.size

    x_sum = 0.
    y_sum = 0.
    x_sq_sum = 0.
    x_y_sum = 0.

    for i in range(M):
        x_sum += x[i]
        y_sum += y[i]
        x_sq_sum += x[i] ** 2
        x_y_sum += x[i] * y[i]

    slope = (M * x_y_sum - x_sum * y_sum) / (M * x_sq_sum - x_sum**2)
    intercept = (y_sum - slope * x_sum) / M

    return slope, intercept

python - 与 CPython 相比，Numba 和 Cython 并没有显着提高性能，也许我使用不正确？

1)

使用 Numpy 数组作为数据输入的基准测试

使用 Python 列表作为输入进行基准测试

2)

CPython

赛通

努巴

2 回答 2

Related

Reference