修订摘要
好吧,看起来系统调用肯定与 GC 有关,而根本问题只是 GC 发生得太频繁了。这似乎与 splitWhen 和 pack 的使用有关,我可以通过分析来判断。
splitWhen 的实现将每个块从惰性文本转换为严格文本,并将它们全部连接起来,因为它建立了一个块缓冲区。这势必会分配很多。
pack,因为它正在从一种类型转换为另一种类型,所以必须分配,这在我的内部循环中,所以这也是有道理的。
原始问题
我在基于 haskell 枚举器的 IO 中偶然发现了一些令人惊讶的系统调用活动。希望有人能对此有所了解。
我一直在玩弄我曾经写了几个月的快速 perl 脚本的 haskell 版本,时断时续。该脚本从每一行读取一些 json,然后打印出一个特定字段(如果存在)。
这是 perl 版本,以及我如何运行它。
cat ~/sample_input | perl -lpe '($_) = grep(/type/, split(/,/))' > /dev/null
这是 haskell 版本(它的调用方式与 perl 版本类似)。
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Internal as EI
import qualified Data.Enumerator.Text as ET
import qualified Data.Enumerator.List as EL
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Data.Functor
import Control.Monad
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TLI
import System.Environment
import System.IO (stdin, stdout)
import GHC.IO.Handle (hSetBuffering, BufferMode(BlockBuffering))
fieldEnumerator field = enumStdin E.$= splitOn [',','\n'] E.$= grabField field
enumStdin = ET.enumHandle stdin
splitOn :: [Char] -> EI.Enumeratee T.Text T.Text IO b
splitOn chars = (ET.splitWhen (`elem` chars))
grabField :: String -> EI.Enumeratee T.Text T.Text IO b
grabField = EL.filter . T.isInfixOf . T.pack
intercalateNewlines = EL.mapM_ (\field -> (TI.putStrLn field >> (putStr "\n\n")))
runE enum = E.run_ $ enum E.$$ intercalateNewlines
main = do
(field:_) <- getArgs
runE $ fieldEnumerator field
令人惊讶的是,haskell 版本的跟踪看起来像这样(实际的 JSON 被抑制,因为它是来自工作的数据),而 perl 版本则符合我的预期;一堆读取,然后是写入,重复。
55333/0x8816f5: 366125 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 366136 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 367209 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 367218 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 368449 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 368458 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 369525 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 369534 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 370610 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 370620 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 371735 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 371744 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 371798 5 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 371802 3 1 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 372907 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 372918 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 374063 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 374072 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 375147 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 375156 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 376283 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 376292 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 376809 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 376814 5 3 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 377378 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 377387 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 378537 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 378546 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 379598 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 379604 3 0 sigreturn(0x7FFF5FBFF9A0, 0x1E, 0x1) = 0 Err#-2
55333/0x8816f5: 379613 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 380667 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 380678 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 381862 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 381871 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 382032 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 382036 4 2 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 383064 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 383073 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 384118 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 384127 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 385206 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 385215 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 386348 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 386358 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 387468 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 387477 11 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 387614 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 387620 5 3 read(0x0, SOME_JSON, 0x1FA0) = 8096 0
55333/0x8816f5: 388597 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 388606 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 389707 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0
55333/0x8816f5: 389716 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0
55333/0x8816f5: 390261 7 3 select(0x2, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0
55333/0x8816f5: 390273 6 3 write(0x1, SOME_OUTPUT, 0x1FA0) = 8096 0