2

考虑这个MWE

import numpy as np
a = np.random.uniform(0,1,size=[14,25,25])
b = np.random.uniform(0,1,size=[14,25,25])
c = np.random.uniform(0,1,size=[14,25])

def my_func(a,b,c):
    InnerSum = np.einsum('lpk, lkm -> lpm', a, b)
    OuterSum = np.einsum('lp, lpm -> lm', c, InnerSum )
    Result = 2 * OuterSum
    return Result

my_func() 是我第一次尝试进行计算,但我想加快速度。然后我尝试了以下修改后的功能:

def my_func_2(a,b,c):    
    OuterSum = np.einsum('lpk, lkm, lp -> lm', a, b, c)
    Result = 2 * OuterSum
    return Result

但是,当我%timeit在这两个功能上运行时,我得到

%timeit my_func(a,b,c)
293 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit my_func_2(a,b,c)
347 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

为什么第二种方法比第一种慢?如何优化 my_func() 以使其更快?

4

1 回答 1

2

鉴于与沿其他轴的长度相比,循环计数(和的长度ab不是一个巨大的数字,我们可以运行一个简单的循环并在每次迭代中利用 BLAS 支持的矩阵乘法。长度也意味着每次迭代有足够的总和减少,证明这种情况下的 for 循环是合理的。

实施将是 -

N,M = b.shape[::2]
out = np.empty((N,M))
for i in range(N):
    out[i] = c[i].dot(a[i]).dot(b[i])
out *= 2

基准测试

使用这个optimize论点,这似乎显着提高了性能,my_func_2并且还添加了提议的一个作为另一个函数 -

def my_func(a,b,c, optimize=False):
    InnerSum = np.einsum('lpk, lkm -> lpm', a, b,optimize=optimize)
    OuterSum = np.einsum('lp, lpm -> lm', c, InnerSum, optimize=optimize)
    Result = 2 * OuterSum
    return Result

def my_func_2(a,b,c, optimize=False):    
    OuterSum = np.einsum('lpk, lkm, lp -> lm', a, b, c,optimize=optimize)
    Result = 2 * OuterSum
    return Result

def my_func_3(a,b,c):
    N,M = b.shape[::2]
    out = np.empty((N,M))
    for i in range(N):
        out[i] = c[i].dot(a[i]).dot(b[i])
    out *= 2
    return out

计时 -

In [51]: # Setup used in the question
    ...: np.random.seed(0)
    ...: a = np.random.uniform(0,1,size=[14,25,25])
    ...: b = np.random.uniform(0,1,size=[14,25,25])
    ...: c = np.random.uniform(0,1,size=[14,25])

# With einsum optimize set as False
In [52]: %timeit my_func(a,b,c, optimize=False)
    ...: %timeit my_func_2(a,b,c, optimize=False)
    ...: %timeit my_func_3(a,b,c)
1000 loops, best of 3: 255 µs per loop
1000 loops, best of 3: 302 µs per loop
10000 loops, best of 3: 28.7 µs per loop

# With einsum optimize set as True
In [53]: %timeit my_func(a,b,c, optimize=True)
    ...: %timeit my_func_2(a,b,c, optimize=True)
    ...: %timeit my_func_3(a,b,c)
1000 loops, best of 3: 334 µs per loop
10000 loops, best of 3: 77.6 µs per loop
10000 loops, best of 3: 28.6 µs per loop
于 2018-03-14T10:03:56.950 回答