haskell - 在模拟中控制内存分配/GC？

Question

我在弄清楚如何在Statemonad 中运行的模拟中减少内存使用和 GC 时间时遇到了一些麻烦。目前我必须运行编译后的代码+RTS -K100M以避免堆栈空间溢出，并且 GC 统计数据非常可怕（见下文）。

以下是相关的代码片段。可以在http://hpaste.org/68527找到完整的、有效的 (GHC 7.4.1) 代码。

-- Lone algebraic data type holding the simulation configuration.
data SimConfig = SimConfig {
        numDimensions :: !Int            -- strict
    ,   numWalkers    :: !Int            -- strict
    ,   simArray      :: IntMap [Double] -- strict spine
    ,   logP          :: Seq Double      -- strict spine
    ,   logL          :: Seq Double      -- strict spine
    ,   pairStream    :: [(Int, Int)]    -- lazy (infinite) list of random vals
    ,   doubleStream  :: [Double]        -- lazy (infinite) list of random vals
    } deriving Show

-- The transition kernel for the simulation.
simKernel :: State SimConfig ()
simKernel = do
    config <- get
    let arr   = simArray      config
    let n     = numWalkers    config
    let d     = numDimensions config
    let rstm0 = pairStream    config
    let rstm1 = doubleStream  config
    let lp    = logP          config
    let ll    = logL          config

    let (a, b)    = head rstm0                           -- uses random stream    
    let z0 = head . map affineTransform $ take 1 rstm1   -- uses random stream
            where affineTransform a = 0.5 * (a + 1) ^ 2


    let proposal  = zipWith (+) r1 r2
            where r1    = map (*z0)     $ fromJust (IntMap.lookup a arr)
                  r2    = map (*(1-z0)) $ fromJust (IntMap.lookup b arr)

    let logA = if val > 0 then 0 else val
            where val = logP_proposal + logL_proposal - (lp `index` (a - 1)) - (ll `index` (a - 1)) + ((fromIntegral n - 1) * log z0)
                  logP_proposal = logPrior proposal
                  logL_proposal = logLikelihood proposal

    let cVal       = (rstm1 !! 1) <= exp logA            -- uses random stream

    let newConfig = SimConfig { simArray = if   cVal
                                           then IntMap.update (\_ -> Just proposal) a arr
                                           else arr
                              , numWalkers = n
                              , numDimensions = d
                              , pairStream   = drop 1 rstm0
                              , doubleStream = drop 2 rstm1
                              , logP = if   cVal
                                       then Seq.update (a - 1) (logPrior proposal) lp
                                       else lp
                              , logL = if   cVal
                                       then Seq.update (a - 1) (logLikelihood proposal) ll
                                       else ll
                              }

    put newConfig

main = do 
    -- (some stuff omitted)
    let sim = logL $ (`execState` initConfig) . replicateM 100000 $ simKernel
    print sim

就堆而言，配置文件似乎暗示System.Random除了之外的函数(,)是内存的罪魁祸首。我不能直接包含图像，但您可以在此处查看堆配置文件：http: //i.imgur.com/5LKxX.png。

我不知道如何进一步减少这些东西的存在。随机变量是在Statemonad 外部生成的（以避免在每次迭代时拆分生成器），并且我相信在从模拟配置中包含的惰性列表 ( ) 中提取一对时会出现(,)内部的唯一实例。simKernelpairStream

包括GC在内的统计数据如下：

  1,220,911,360 bytes allocated in the heap
     787,192,920 bytes copied during GC
     186,821,752 bytes maximum residency (10 sample(s))
       1,030,400 bytes maximum slop
             449 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2159 colls,     0 par    0.80s    0.81s     0.0004s    0.0283s
  Gen  1        10 colls,     0 par    0.96s    1.09s     0.1094s    0.4354s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.95s  (  0.97s elapsed)
  GC      time    1.76s  (  1.91s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.72s  (  2.88s elapsed)

  %GC     time      64.9%  (66.2% elapsed)

  Alloc rate    1,278,074,521 bytes per MUT second

  Productivity  35.1% of total user, 33.1% of total elapsed

再一次，我必须提高最大堆栈大小才能运行模拟。我知道某处一定有一个大笨蛋……但我不知道在哪里？

在这样的问题中，如何改进堆/堆栈分配和 GC？我怎样才能确定一个 thunk 可能在哪里建立？这里使用Statemonad 是不是被误导了？

--

更新：

编译时我忽略了查看分析器的输出-fprof-auto。这是该输出的开头：

COST CENTRE                       MODULE                             no.     entries  %time %alloc   %time %alloc

MAIN                              MAIN                                58           0    0.0    0.0   100.0  100.0
 main                             Main                               117           0    0.0    0.0   100.0  100.0
  main.randomList                 Main                               147           1   62.0   55.5    62.0   55.5
  main.arr                        Main                               142           1    0.0    0.0     0.0    0.0
   streamToAssocList              Main                               143           1    0.0    0.0     0.0    0.0
    streamToAssocList.go          Main                               146           5    0.0    0.0     0.0    0.0
  main.pairList                   Main                               137           1    0.0    0.0     9.5   16.5
   consPairStream                 Main                               138           1    0.7    0.9     9.5   16.5
    consPairStream.ys             Main                               140           1    4.3    7.8     4.3    7.8
    consPairStream.xs             Main                               139           1    4.5    7.8     4.5    7.8
  main.initConfig                 Main                               122           1    0.0    0.0     0.0    0.0
   logLikelihood                  Main                               163           0    0.0    0.0     0.0    0.0
   logPrior                       Main                               161           5    0.0    0.0     0.0    0.0
  main.sim                        Main                               118           1    1.0    2.2    28.6   28.1
   simKernel                      Main                               120           0    4.8    5.1    27.6   25.8

我不确定如何准确解释这一点，但随机双打的懒惰流randomList让我畏缩。我不知道如何改进。

score 3 · Accepted Answer

我已经用一个工作示例更新了 hpaste。看起来罪魁祸首是：

SimConfig三个字段中缺少严格性注释simArray：logP和logL

    数据 SimConfig = SimConfig {
            numDimensions :: !Int -- 严格
        , numWalkers :: !Int -- 严格
        , simArray :: !(IntMap [Double]) -- 严格的脊椎
        , logP :: !(Seq Double) -- 严格脊椎
        , logL :: !(Seq Double) -- 严格脊椎
        , pairStream :: [(Int, Int)] -- 懒惰
        , doubleStream :: [Double] -- 懒惰
        } 导出显示

newConfigsimKernel由于State懒惰，从未在循环中进行评估。另一种选择是改用严格的State单子。
```
put $! newConfig
```
execState ... replicateM还构建了 thunk。我最初将其替换为 afoldl'并将其移动execState到折叠中，但我认为交换replicateM_是等效的并且更易于阅读：
```
let sim = logL $ execState (replicateM_ epochs simKernel) initConfig
--  sim = logL $ foldl' (const . execState simKernel) initConfig [1..epochs]
```

并且有几个电话mapM .. replicate被替换为replicateM. 特别值得注意的是consPairList它大大减少了内存使用量。仍有改进的余地，但最容易实现的目标涉及 unsafeInterleaveST ......所以我停止了。

我不知道输出结果是否是您想要的：

fromList [-4.287033457733427,-1.8000404912760795,-5.581988678626085,-0.9362372340483293,-5.267791907985331]

但这里是统计数据：

     在堆中分配了 268,004,448 字节
      GC 期间复制了 70,753,952 个字节
      16,014,224 字节最大驻留（7 个样本）
       1,372,456 字节最大斜率
              40 MB 总内存正在使用（0 MB 由于碎片丢失）

                                    总时间（经过） 平均暂停 最大暂停
  Gen 0 490 colls，0 par 0.05s 0.05s 0.0001s 0.0012s
  Gen 1 7 colls，0 par 0.04s 0.05s 0.0076s 0.0209s

  初始化时间 0.00s（经过 0.00s）
  MUT时间0.12s（经过0.12s）
  GC时间0.09s（经过0.10s）
  退出时间 0.00s（经过 0.00s）
  总时间 0.21s（经过 0.22s）

  %GC 时间 42.2%（经过 45.1%）

  分配速率 2,241,514,569 字节/MUT 秒

  生产力占总用户的 57.8%，占总使用时间的 53.7%

haskell - 在模拟中控制内存分配/GC？

1 回答 1

Related

Reference