parsing - 将普通 attoparsec 解析器代码转换为基于管道/管道

Question

我使用以下方法编写了以下解析代码attoparsec：

data Test = Test {
  a :: Int,
  b :: Int
  } deriving (Show)

testParser :: Parser Test
testParser = do
  a <- decimal
  tab
  b <- decimal
  return $ Test a b

tParser :: Parser [Test]
tParser =  many' $ testParser <* endOfLine

这适用于小型文件，我执行它是这样的：

main :: IO ()
main = do
  text <- TL.readFile "./testFile"
  let (Right a) = parseOnly (manyTill anyChar endOfLine *> tParser) text
  print a

但是当文件的大小大于 70MB 时，它会消耗大量的内存。作为解决方案，我想我会使用attoparsec-conduit. 在浏览了他们的API之后，我不确定如何让它们一起工作。我的解析器具有类型Parser Test，但它sinkParser实际上接受 type 的解析器Parser a b。我对如何在常量内存中执行这个解析器感兴趣？（基于管道的解决方案也是可以接受的，但我不习惯 Pipes API。）

score 5 · Accepted Answer

第一个类型参数Parser只是输入的数据类型（Text或ByteString）。您可以提供您的testParser函数作为参数sinkParser，它会正常工作。这是一个简短的示例：

{-# LANGUAGE OverloadedStrings #-}
import           Conduit                 (liftIO, mapM_C, runResourceT,
                                          sourceFile, ($$), (=$))
import           Data.Attoparsec.Text    (Parser, decimal, endOfLine, space)
import           Data.Conduit.Attoparsec (conduitParser)

data Test = Test {
  a :: Int,
  b :: Int
  } deriving (Show)

testParser :: Parser Test
testParser = do
  a <- decimal
  space
  b <- decimal
  endOfLine
  return $ Test a b

main :: IO ()
main = runResourceT
     $ sourceFile "foo.txt"
    $$ conduitParser testParser
    =$ mapM_C (liftIO . print)

score 5 · Accepted Answer

这是pipes解决方案（假设您使用的是Text基于 - 的解析器）：

import Pipes
import Pipes.Text.IO (fromHandle)
import Pipes.Attoparsec (parsed)
import qualified System.IO as IO

main = IO.withFile "./testfile" IO.ReadMode $ \handle -> runEffect $
    for (parsed (testParser <* endOfLine) (fromHandle handle)) (lift . print)

parsing - 将普通 attoparsec 解析器代码转换为基于管道/管道

2 回答 2

Related

Reference