performance - 为什么 List 不像 Vector 那样融合？（哈斯克尔）

Question

考虑以下基准：

module Main where

import qualified Data.List as L
import qualified Data.Vector.Unboxed as U
import Criterion.Main



goodSum :: Int -> Double
{-# NOINLINE goodSum #-}
goodSum n =
  let ints = U.enumFromN 0 (n * n * 10) :: U.Vector Int
  in U.foldl' (+) 0 $ U.map fromIntegral ints

badSum :: Int -> Double
{-# NOINLINE badSum #-}
badSum n = L.foldl' (+) 0.5 [fromIntegral i | i <- [0 .. 10*n*n]]

badSum2 :: Int -> Double
{-# NOINLINE badSum2 #-}
badSum2 n = L.foldr (+) 0.5 [fromIntegral i | i <- [0 .. 10*n*n]]

worstSum :: Int -> Double
{-# NOINLINE worstSum #-}
worstSum n = L.foldl1' (+) $ do
  i <- [0 .. n*n]
  return $ L.foldl1' (+) $ do
    k <- [0 .. 10]
    return $ fromIntegral $ k + i

main = do
  defaultMain
    [ bench "good" $ nf goodSum 500
    , bench "bad" $ nf badSum 500
    , bench "bad2" $ nf badSum2 500
    , bench "worst" $ nf worstSum 500
    ]

结果：

benchmarking good
time                 1.826 ms   (1.819 ms .. 1.835 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.810 ms   (1.803 ms .. 1.817 ms)
std dev              23.18 μs   (19.91 μs .. 27.96 μs)

benchmarking bad
time                 38.38 ms   (38.07 ms .. 38.74 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 38.23 ms   (38.07 ms .. 38.38 ms)
std dev              298.5 μs   (220.6 μs .. 417.8 μs)

benchmarking bad2
time                 77.87 ms   (73.74 ms .. 82.67 ms)
                     0.992 R²   (0.976 R² .. 0.998 R²)
mean                 78.14 ms   (75.33 ms .. 82.13 ms)
std dev              5.184 ms   (3.056 ms .. 7.966 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking worst
time                 80.80 ms   (79.53 ms .. 82.10 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 80.73 ms   (80.29 ms .. 81.19 ms)
std dev              756.9 μs   (591.9 μs .. 938.2 μs)

列表推导是好的生产者，foldr也是好的消费者，那么为什么列表没有融合呢？

score 4 · Accepted Answer

与您的问题相反foldr，实际上，列表理解确实融合了。但是，您必须回忆一下它的定义，foldr而不是它不是尾递归函数。前奏定义foldr为，它不会像Vector基础示例那样编译为紧密循环。

foldr k z = go
  where
    go []     = z
    go (y:ys) = y `k` go ys

生成的重要核心badSum2看起来像这样

$wgo_s8AH [Occ=LoopBreaker] :: Int# -> Double#
$wgo_s8AH =
  \ (w_s8AD :: Int#) ->
    case tagToEnum#
           @ Bool (==# w_s8AD y1_a69V)
    of _ [Occ=Dead] {
      False ->
        case $wgo_s8AH (+# w_s8AD 1) of ww_s8AG { __DEFAULT ->
        +## (int2Double# w_s8AD) ww_s8AG
        };
      True -> +## (int2Double# w_s8AD) 0.5

这大致相当于这个函数（模未装箱算术）

badSum3 :: Int -> Double
badSum3 n = go 0
  where
    stop = 10 * n * n

    go i | i == stop = fromIntegral i + 0.5
         | otherwise = fromIntegral i + go (i + 1)

通过 Criterion 运行此函数，此函数提供与 badSum2. 虽然生成的函数不会产生和消耗中间 cons 单元，但它仍在执行函数调用并执行所有相关的堆栈操作。

基于版本的性能不佳foldl'是由于它foldl'不是一个好的消费者，因此它不能与列表理解融合。左折叠将产生一个尾递归循环，该循环遍历列表推导生成的列表，从而产生所有分配和相关内存操作的开销。

我不确定您是否可以获得与Vector使用标准列表操作的操作相同的性能，但是流融合包提供了使用不同融合方法的列表组合器，它可以在这个问题上实现类似的性能。

import qualified Data.List.Stream as S

-- The stream-fusion package does not seem to have `enumFrom` functions
enumerate :: Int -> [Int]
enumerate n = S.unfoldr f 0
  where
    f i | i > n     = Nothing
        | otherwise = Just (i, i + 1)

goodSum2 :: Int -> Double
goodSum2 n = S.foldl' (+) 0.5 $ S.map fromIntegral $ enumerate (n * n * 10)

performance - 为什么 List 不像 Vector 那样融合？（哈斯克尔）

1 回答 1

Related

Reference