8

是否有与以下功能等效的标准库/numpy:

def augmented_assignment_sum(iterable, start=0):
    for n in iterable:
        start += n
    return start

?

虽然sum(ITERABLE)非常优雅,但它使用+运算符而不是+=,这在np.ndarray对象的情况下可能会影响性能。

我已经测试过我的功能可能和它一样快sum()(而它的等效使用+要慢得多)。由于它是一个纯 Python 函数,我猜它的性能仍然存在缺陷,因此我正在寻找一些替代方案:

In [49]: ARRAYS = [np.random.random((1000000)) for _ in range(100)]

In [50]: def not_augmented_assignment_sum(iterable, start=0): 
    ...:     for n in iterable: 
    ...:         start = start + n 
    ...:     return start 
    ...:                                                                                                                                                                                                                                                                       

In [51]: %timeit not_augmented_assignment_sum(ARRAYS)                                                                                                                                                                                                                          
63.6 ms ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [52]: %timeit sum(ARRAYS)                                                                                                                                                                                                                                                   
31.2 ms ± 2.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [53]: %timeit augmented_assignment_sum(ARRAYS)                                                                                                                                                                                                                              
31.2 ms ± 4.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [54]: %timeit not_augmented_assignment_sum(ARRAYS)                                                                                                                                                                                                                          
62.5 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [55]: %timeit sum(ARRAYS)                                                                                                                                                                                                                                                   
37 ms ± 9.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [56]: %timeit augmented_assignment_sum(ARRAYS)                                                                                                                                                                                                                              
27.7 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

我曾尝试使用functools.reduce与组合operator.iadd,但它的性能是相似的:

In [79]: %timeit reduce(iadd, ARRAYS, 0)                                                                                                                                                                                                                                       
33.4 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [80]: %timeit reduce(iadd, ARRAYS, 0)                                                                                                                                                                                                                                       
29.4 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

我也对内存效率感兴趣,因此更喜欢增强分配,因为它们不需要创建中间对象。

4

1 回答 1

2

标题问题的答案——我希望@Martijn Pieters 能原谅我选择的比喻——直接从马的嘴里说:不,没有这样的内置函数。

如果我们允许几行代码来实现这样的等价物,我们会得到一张相当复杂的图片,其中最快的速度取决于操作数的大小:

在此处输入图像描述

该图显示了不同方法相对于sum过操作数大小的时序,项数始终为 100。augmented_assignment_sum开始对相对较大的操作数大小产生回报。在测试的大多数范围内,使用scipy.linalg.blas.*axpy看起来很有竞争力,它的主要缺点是不如sum.

代码:

from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from scipy.linalg import blas

B = BenchmarkBuilder()

@B.add_function()
def augmented_assignment_sum(iterable, start=0):
    for n in iterable:
        start += n
    return start

@B.add_function()
def not_augmented_assignment_sum(iterable, start=0):
    for n in iterable:
        start = start + n
    return start

@B.add_function()
def plain_sum(iterable, start=0):
    return sum(iterable,start)

@B.add_function()
def blas_sum(iterable, start=None):
    iterable = iter(iterable)
    if start is None:
        try:
            start = next(iterable).copy()
        except StopIteration:
            return 0
    try:
        f = {np.dtype('float32'):blas.saxpy,
             np.dtype('float64'):blas.daxpy,
             np.dtype('complex64'):blas.caxpy,
             np.dtype('complex128'):blas.zaxpy}[start.dtype]
    except KeyError:
        f = blas.daxpy
        start = start.astype(float)
    for n in iterable:
        f(n,start)
    return start

@B.add_arguments('size of terms')
def argument_provider():
    for exp in range(1,21):
        sz = int(2**exp)
        yield sz,[np.random.randn(sz) for _ in range(100)]

r = B.run()
r.plot(relative_to=plain_sum)

import pylab
pylab.savefig('inplacesum.png')
于 2019-11-19T11:08:02.993 回答