我有一个 2.5 MB 的文件,其中包含由空格分隔的浮点数(下面的代码可以为您生成它),并希望将其解析为带有attoparsec
.
它出奇的慢,花了将近一秒钟,并分配了大量内存:
time ./Attoparsec-problem +RTS -sstderr < floats.txt
299999.0
956,647,344 bytes allocated in the heap
752,875,520 bytes copied during GC
166,485,416 bytes maximum residency (7 sample(s))
8,874,168 bytes maximum slop
337 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1604 colls, 0 par 0.21s 0.27s 0.0002s 0.0092s
Gen 1 7 colls, 0 par 0.24s 0.36s 0.0520s 0.1783s
...
%GC time 65.5% (75.1% elapsed)
Alloc rate 3,985,781,488 bytes per MUT second
Productivity 34.5% of total user, 28.6% of total elapsed
我的解析器非常简单:本质上是double <* skipSpace
.
这是代码:
import Control.Applicative
import Data.Attoparsec.ByteString.Char8 as A
import qualified Data.ByteString as BS
import qualified Data.Vector.Unboxed as U
-- Compile with:
-- ghc --make -O2 -prof -auto-all -caf-all -rtsopts -fforce-recomp Attoparsec-problem.hs
-- Run:
-- time ./Attoparsec-problem +RTS -sstderr < floats.txt
main :: IO ()
main = do
-- For creating the test file (2.5 MB)
-- writeFile "floats.txt" (Prelude.unwords $ Prelude.map show [1.0,2.0..300000.0])
r <- parse parser <$> BS.getContents
case r of
Done _ arr -> print $ U.last arr
x -> print x
where
parser = do
U.replicateM (300000-1) (double <* skipSpace)
-- This gives surprisingly bad productivity (70% GC time) and 180 MB max residency
-- for a 2.5 MB file!
你能解释一下发生了什么吗?