haskell - attoparsec 中的条件前瞻

Question

假设有一个数据结构表示内部带有注释的文本。

data TWC
  = T Text TWC -- text
  | C Text TWC -- comment
  | E -- end
  deriving Show

因此像字符串一样

"Text, {-comment-}, and something else"

可以编码为

T "Text, " (C "comment" (T ", and something else" E))

注释块和 forE的解析器非常简单：

twcP :: Parser TWC
twcP = eP <|> cP <|> tP

cP :: Parser TWC
cP = do
  _ <- string "{-"
  c <- manyTill anyChar (string "-}")
  rest <- cP <|> tP <|> eP
  return (C (pack c) rest)

eP :: Parser TWC
eP = do
  endOfInput
  return E

以如此简单的方式实现文本块的解析器

tP :: Parser TWC
tP = do
  t <- many1 anyChar
  rest <- cP <|> eP
  return (T (pack t) rest)

由于其贪婪的性质，使其将评论部分作为文本使用

> parseOnly twcP "text{-comment-}"
Right (T "text{-comment-}" E)
it ∷ Either String TWC

那么，问题是如何表达直到输入结束或直到评论部分的解析逻辑？换句话说，如何实现条件前瞻解析器？

score 5 · Accepted Answer

你是对的，有问题的代码是的第一行tP，它贪婪地解析文本而不会停留在注释处：

tP = do
  t <- many1 anyChar

在解决这个问题之前，我首先想稍微重构一下您的代码以引入帮助程序并使用应用风格，并将有问题的代码隔离到text帮助程序中：

-- Like manyTill, but pack the result to Text.
textTill :: Alternative f => f Char -> f b -> f Text
textTill p end = pack <$> manyTill p end

-- Parse one comment string
comment :: Parser Text
comment = string "{-" *> textTill anyChar (string "-}")

-- Parse one non-comment text string (problematic implementation)
text :: Parser Text
text = pack <$> many1 anyChar

-- TWC parsers:

twcP :: Parser TWC
twcP = eP <|> cP <|> tP

cP :: Parser TWC
cP = C <$> comment <*> twcP

eP :: Parser TWC
eP = E <$ endOfInput

tP :: Parser TWC
tP = T <$> text <*> twcP

为了实现前瞻，我们可以使用lookAhead组合器，它在不消耗输入的情况下应用解析器。这允许我们进行text解析，直到它到达 a comment（不消耗它），或者endOfInput：

-- Parse one non-comment text string (working implementation)
text :: Parser Text
text = textTill anyChar (void (lookAhead comment) <|> endOfInput)

使用该实现，twcP行为符合预期：

ghci> parseOnly twcP "text{-comment-} post"
Right (T "text" (C "comment" (T " post" E)))

haskell - attoparsec 中的条件前瞻

1 回答 1

Related

Reference