0

对于这个话题np.einsum,我已经阅读了一堆讨论:

为了更多地了解为什么np.eimsum比通常np.sum的 ,np.product等更快(即使对于 anaconda 中的最新 numpy 版本),我正在使用np.einsum_path来查看优化过程优化了什么。

这样做的时候,我发现了一个有趣的现象。考虑这个最小的例子:

import numpy as np
for i in 'int8 int16 int32 int64 uint8 uint16 uint32 uint64 float32 float64'.split():
    print(np.einsum_path('i->', np.empty(2**30, i))[1])

输出都是相同的:

  Complete contraction:  i->
         Naive scaling:  1
     Optimized scaling:  1
      Naive FLOP count:  1.074e+09
  Optimized FLOP count:  2.147e+09
   Theoretical speedup:  0.500
  Largest intermediate:  1.000e+00 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   1                         i->                                       ->

优化的 FLOP在哪里增加(这意味着更多的计算?)并且理论加速比小于 1(这意味着更慢)。但是,如果我们实际计算计算时间:

for i in 'int8 int16 int32 int64 uint8 uint16 uint32 uint64 float32 float64'.split():
    a    = np.empty(2**27, i)
    raw  = %timeit -qon9 a.sum()
    noOp = %timeit -qon9 np.einsum('i->', a, optimize=False)
    op   = %timeit -qon9 np.einsum('i->', a, optimize='greedy')
    print(i, raw.average/op.average, noOp.average/op.average, sep='\t')

如果我们查看对应于“本机”时序除以优化时序的第二列,它们都接近 1,这意味着优化并没有使其变慢:

int8    4.485133392283354   1.0205873691331475
int16   3.7817373109729213  0.9528030137222752
int32   1.3760725925789292  1.0741615462167338
int64   1.0793509548186524  1.0076602576129605
uint8   4.509893894635594   0.997277624256872
uint16  3.964949791428885   0.9914991211913878
uint32  1.3054813163356085  1.009475242303559
uint64  1.0747670688044795  1.0082522386805526
float32 2.4105510701565636  0.9998241152368149
float64 2.1957241421227556  0.9836838487664662

我想知道伤口会np.einsum_path说它需要更多的 FLOP 并且它更慢?我相信时间是直接从 FLOP 的数量计算出来的,所以这两个基准基本上指的是同一件事。

顺便说一句,我附上了一个示例,显示np.einsum_path“通常”如何表现以说服上面的结果是异常的:

a = np.empty((64, 64))
print(np.einsum_path('ij,jk,kl->il', a, a, a)[1])
noOp = %timeit -qon99 np.einsum('ij,jk,kl->il', a, a, a, optimize=False)
op   = %timeit -qon99 np.einsum('ij,jk,kl->il', a, a, a, optimize='greedy')
print('Actual speedup:', noOp.average / op.average)

输出:

  Complete contraction:  ij,jk,kl->il
         Naive scaling:  4
     Optimized scaling:  3
      Naive FLOP count:  5.033e+07
  Optimized FLOP count:  1.049e+06
   Theoretical speedup:  48.000
  Largest intermediate:  4.096e+03 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   3                   jk,ij->ik                                kl,ik->il
   3                   ik,kl->il                                   il->il
Actual speed up: 90.33518444642904
4

1 回答 1

0

我刚刚深入研究了np.einsum_path. 根据此处的评论(即此处):

# Compute naive cost
# This isn't quite right, need to look into exactly how einsum does this

以及计算最优成本的方式(即此处,不发布,太长)。两种成本的计算方式似乎不一致,而第一个成本被记录为“不太正确”。

然后我打印了“本机”(即未优化)einsum_path:

import numpy as np
print(np.einsum_path('i->', np.empty(2**30, 'b'), optimize=False)[1])

这令人惊讶的是“与原生不同”:

  Complete contraction:  i->
         Naive scaling:  1
     Optimized scaling:  1
      Naive FLOP count:  1.074e+09
  Optimized FLOP count:  2.147e+09
   Theoretical speedup:  0.500
  Largest intermediate:  1.000e+00 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   1                         i->                                       ->

因此,报告的速度下降的原因只是失败计数错误。np.einsum实际上没有进行任何路径np.sum优化(尽管无论出于何种原因,它仍然从根本上比原生更快)。

于 2018-11-07T05:19:10.417 回答