haskell - 如何优化这个 Haskell 程序？

Question

我使用以下代码通过使用状态单子缓存输入结果对来记忆 Collat z 函数的总停止时间。

此外，snd状态的一部分用于跟踪使输出最大化的输入值，目标是找到使总停止时间最大化的百万以下的输入值。（这个问题可以在项目 euler上找到。

import Control.Applicative
import Control.Arrow
import Control.Monad.State
import qualified Data.Map.Strict as M

collatz :: Integer -> Integer
collatz n = if odd n
              then 3 * n + 1
              else n `div` 2

memoCollatz :: Integer
            -> State (M.Map Integer Int, (Integer,Int)) Int
memoCollatz 1 = return 1
memoCollatz n = do
    result <- gets (M.lookup n . fst)
    case result of
        Nothing -> do
            l <- succ <$> memoCollatz (collatz n)
            let update p@(_,curMaxV) =
                    if l > curMaxV
                       then (n,l)
                       else p
            modify (M.insert n l *** update)
            return l
        Just v -> return v

main :: IO ()
main = print $ snd (execState (mapM_ memoCollatz [1..limit]) (M.empty,(1,1)))
  where
    limit = 1000000

该程序运行良好，但速度很慢。所以我想花一些时间弄清楚如何让它更快地工作。

我查看了RWH的分析章节，但不知道问题出在哪里：

我使用编译它ghc -O2 -rtsopts -prof -auto-all -caf-all -fforce-recomp，然后运行它，+RTS -s -p结果如下：

   6,633,397,720 bytes allocated in the heap
   9,357,527,000 bytes copied during GC
   2,616,881,120 bytes maximum residency (15 sample(s))
      60,183,944 bytes maximum slop
            5274 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10570 colls,     0 par    3.36s    3.36s     0.0003s    0.0013s
  Gen  1        15 colls,     0 par    7.03s    7.03s     0.4683s    3.4337s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    4.02s  (  4.01s elapsed)
  GC      time   10.39s  ( 10.39s elapsed)
  RP      time    0.00s  (  0.00s elapsed)
  PROF    time    0.00s  (  0.00s elapsed)
  EXIT    time    0.16s  (  0.16s elapsed)
  Total   time   14.57s  ( 14.56s elapsed)

  %GC     time      71.3%  (71.3% elapsed)

  Alloc rate    1,651,363,842 bytes per MUT second

  Productivity  28.7% of total user, 28.7% of total elapsed

和.prof文件：

    total time  =        4.08 secs   (4080 ticks @ 1000 us, 1 processor)
    total alloc = 3,567,324,056 bytes  (excludes profiling overheads)

COST CENTRE        MODULE    %time %alloc

memoCollatz        Main       84.9   91.9
memoCollatz.update Main       10.5    0.0
main               Main        2.4    5.8
collatz            Main        2.2    2.3


                                                                 individual     inherited
COST CENTRE            MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN                   MAIN                     52           0    0.0    0.0   100.0  100.0
 main                  Main                    105           0    0.0    0.0     0.0    0.0
 CAF:main1             Main                    102           0    0.0    0.0     0.0    0.0
  main                 Main                    104           1    0.0    0.0     0.0    0.0
 CAF:main2             Main                    101           0    0.0    0.0     0.0    0.0
  main                 Main                    106           0    0.0    0.0     0.0    0.0
 CAF:main4             Main                    100           0    0.0    0.0     0.0    0.0
  main                 Main                    107           0    0.0    0.0     0.0    0.0
 CAF:main5             Main                     99           0    0.0    0.0    94.4   86.7
  main                 Main                    108           0    1.4    0.9    94.4   86.7
   memoCollatz         Main                    113           0   82.4   85.8    92.9   85.8
    memoCollatz.update Main                    115     2168610   10.5    0.0    10.5    0.0
 CAF:main10            Main                     98           0    0.0    0.0     5.1   11.0
  main                 Main                    109           0    0.4    2.7     5.1   11.0
   memoCollatz         Main                    112     3168610    2.5    6.0     4.7    8.3
    collatz            Main                    114     2168610    2.2    2.3     2.2    2.3
 CAF:main11            Main                     97           0    0.0    0.0     0.5    2.2
  main                 Main                    110           0    0.5    2.2     0.5    2.2
   main.limit          Main                    111           1    0.0    0.0     0.0    0.0
 CAF                   GHC.Conc.Signal          94           0    0.0    0.0     0.0    0.0
 CAF                   GHC.IO.Encoding          89           0    0.0    0.0     0.0    0.0
 CAF                   GHC.IO.Encoding.Iconv    88           0    0.0    0.0     0.0    0.0
 CAF                   GHC.IO.Handle.FD         82           0    0.0    0.0     0.0    0.0

我可以看到垃圾收集器花费了太多时间，而程序大部分时间都在运行memoCollatz。

这是堆分析的两个屏幕截图：

图像1

图像2

我预计内存使用量会增加然后迅速减少，因为程序正在使用 Map 进行记忆，但不确定是什么导致图表快速下降（也许这是可视化结果时的错误？）。

我想知道如何分析这些表格/图表以及它们如何表明真正的问题。

score 1 · Accepted Answer

Haskell Wiki 包含几个针对此问题的不同解决方案：（链接）

那里最快的解决方案使用数组来记忆结果。在我的机器上，它运行大约 1 秒，最长。驻留大小约为 35 MB。

下面是一个运行大约 0.3 秒的版本，使用了 Array 版本的 1/4 内存，但它在 IO monad 中运行。

所有不同版本之间都需要权衡取舍，您必须决定您认为可以接受的版本。

{-# LANGUAGE BangPatterns #-}

import Data.Array.IO
import Data.Array.Unboxed
import Control.Monad

collatz x
  | even x    = div x 2
  | otherwise = 3*x+1

solve n = do
  arr <- newArray (1,n) 0 :: IO (IOUArray Int Int)
  writeArray arr 1 1
  let eval :: Int -> IO Int
      eval x = do
        if x > n
          then fmap (1+) $ eval (collatz x)
          else do d <- readArray arr x
                  if d == 0
                    then do d <- fmap (1+) $ eval (collatz x)
                            writeArray arr x d
                            return d
                    else return d
      go :: (Int,Int) -> Int -> IO (Int,Int)
      go !m x = do d <- eval x
                   return $ max m (d,x)
  foldM go (0,0) [2..n]

main = solve 1000000 >>= print

haskell - 如何优化这个 Haskell 程序？

1 回答 1

Related

Reference