performance - 优化一个被多次调用的简单解析器

Question

我为自定义文件编写了一个解析器，使用attoparsec. 分析报告表明，大约 67% 的内存分配是在名为的函数中完成的tab，这也是消耗时间最多的。该tab功能非常简单：

tab :: Parser Char
tab = char '\t'

整个分析报告如下：

       ASnapshotParser +RTS -p -h -RTS

    total time  =       37.88 secs   (37882 ticks @ 1000 us, 1 processor)
    total alloc = 54,255,105,384 bytes  (excludes profiling overheads)

COST CENTRE    MODULE                %time %alloc

tab            Main                   83.1   67.7
main           Main                    6.4    4.2
readTextDevice Data.Text.IO.Internal   5.5   24.0
snapshotParser Main                    4.7    4.0


                                                             individual     inherited
COST CENTRE        MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN               MAIN                     75           0    0.0    0.0   100.0  100.0
 CAF               Main                    149           0    0.0    0.0   100.0  100.0
  tab              Main                    156           1    0.0    0.0     0.0    0.0
  snapshotParser   Main                    153           1    0.0    0.0     0.0    0.0
  main             Main                    150           1    6.4    4.2   100.0  100.0
   doStuff         Main                    152     1000398    0.3    0.0    88.1   71.8
    snapshotParser Main                    154           0    4.7    4.0    87.7   71.7
     tab           Main                    157           0   83.1   67.7    83.1   67.7
   readTextDevice  Data.Text.IO.Internal   151       40145    5.5   24.0     5.5   24.0
 CAF               Data.Text.Array         142           0    0.0    0.0     0.0    0.0
 CAF               Data.Text.Internal      140           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Handle.FD        122           0    0.0    0.0     0.0    0.0
 CAF               GHC.Conc.Signal         103           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding         101           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.FD               100           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding.Iconv    89           0    0.0    0.0     0.0    0.0
  main             Main                    155           0    0.0    0.0     0.0    0.0

我该如何优化呢？

解析器的完整代码在这里。我正在解析的文件大约是 77MB。

score 3 · Accepted Answer

tab是替罪羊。如果您在定义中的每个绑定之前定义boo :: Parser (); boo = return ()并插入 a ，则成本分配将变为：boosnapshotParser

 main             Main                    255           0   11.8   13.8   100.0  100.0
  doStuff         Main                    258     2097153    1.1    0.5    86.2   86.2
   snapshotParser Main                    260           0    0.4    0.1    85.1   85.7
    boo           Main                    262           0   71.0   73.2    84.8   85.5
     tab          Main                    265           0   13.8   12.3    13.8   12.3

attoparsec因此，正如 John L 在评论中所建议的那样，分析器似乎正在将责任归咎于解析结果的分配，这可能是由于代码的广泛内联。

至于性能问题，关键在于，当您解析一个 77MB 的文本文件以构建一个包含一百万个元素的列表时，您希望文件处理是惰性的，而不是严格的。一旦解决了这个问题，解耦 I/O 并解析doStuff和构建没有累加器的快照列表也很有帮助。这是考虑到这一点的程序的修改版本。

{-# LANGUAGE BangPatterns #-}
module Main where

import Data.Maybe
import Data.Attoparsec.Text.Lazy
import Control.Applicative
import qualified Data.Text.Lazy.IO as TL
import Data.Text (Text)
import qualified Data.Text.Lazy as TL

buildStuff :: TL.Text -> [Snapshot]
buildStuff text = case maybeResult (parse endOfInput text) of
  Just _ -> []
  Nothing -> case parse snapshotParser text of
      Done !i !r -> r : buildStuff i
      Fail _ _ _ -> []

main :: IO ()
main = do
  text <- TL.readFile "./snap.dat"
  let ss = buildStuff text
  print $ listToMaybe ss
    >> Just (fromIntegral (length $ show ss) / fromIntegral (length ss))

newtype VehicleId = VehicleId Int deriving Show
newtype Time = Time Int deriving Show
newtype LinkID = LinkID Int deriving Show
newtype NodeID = NodeID Int deriving Show
newtype LaneID = LaneID Int deriving Show

tab :: Parser Char
tab = char '\t'

-- UNPACK pragmas. GHC 7.8 unboxes small strict fields automatically;
-- however, it seems we still need the pragmas while profiling. 
data Snapshot = Snapshot {
  vehicle :: {-# UNPACK #-} !VehicleId,
  time :: {-# UNPACK #-} !Time,
  link :: {-# UNPACK #-} !LinkID,
  node :: {-# UNPACK #-} !NodeID,
  lane :: {-# UNPACK #-} !LaneID,
  distance :: {-# UNPACK #-} !Double,
  velocity :: {-# UNPACK #-} !Double,
  vehtype :: {-# UNPACK #-} !Int,
  acceler :: {-# UNPACK #-} !Double,
  driver :: {-# UNPACK #-} !Int,
  passengers :: {-# UNPACK #-} !Int,
  easting :: {-# UNPACK #-} !Double,
  northing :: {-# UNPACK #-} !Double,
  elevation :: {-# UNPACK #-} !Double,
  azimuth :: {-# UNPACK #-} !Double,
  user :: {-# UNPACK #-} !Int
  } deriving (Show)

-- No need for bang patterns here.
snapshotParser :: Parser Snapshot
snapshotParser = do
  sveh <- decimal
  tab
  stime <- decimal
  tab
  slink <- decimal
  tab
  snode <- decimal
  tab
  slane <- decimal
  tab
  sdistance <- double
  tab
  svelocity <- double
  tab
  svehtype <- decimal
  tab
  sacceler <- double
  tab
  sdriver <- decimal
  tab
  spassengers <- decimal
  tab
  seasting <- double
  tab
  snorthing <- double
  tab
  selevation <- double
  tab
  sazimuth <- double
  tab
  suser <- decimal
  endOfLine <|> endOfInput
  return $ Snapshot
    (VehicleId sveh) (Time stime) (LinkID slink) (NodeID snode)
    (LaneID slane) sdistance svelocity svehtype sacceler sdriver
    spassengers seasting snorthing selevation sazimuth suser

即使您将整个快照列表强制放入内存，此版本也应该具有可接受的性能，就像我在main这里所做的那样。要衡量什么是“可接受的”，请记住，考虑到每个字段中的 16 个（小的、未装箱的）字段Snapshot加上和列表构造函数的开销，Snapshot我们谈论的是每个列表单元格 152 字节，归结为 ~152MB你的测试数据。无论如何，这个版本尽可能地懒惰，正如您通过删除中的除法main或将其替换为last ss.

注意：我的测试是使用 attoparsec-0.12 完成的。

score 1 · Accepted Answer

将 attoparsec 更新到最新版本（0.12.0.0）后，执行时间从 38 秒减少到 16 秒。这是超过 50% 的加速。它消耗的内存也大大减少了。正如@JohnL 所指出的，启用分析后，结果差异很大。当我尝试使用最新版本的 attoparsec 库对其进行分析时，执行整个程序大约需要 64 秒。

performance - 优化一个被多次调用的简单解析器

2 回答 2

Related

Reference