parsing - 在 Attoparsec 中使用 sepBy 字符串

Question

我试图用",",", and"和分隔一个字符串"and"，然后返回两者之间的任何内容。我到目前为止的一个例子如下：

import Data.Attoparsec.Text

sepTestParser = nameSep ((takeWhile1 $ inClass "-'a-zA-Z") <* space)
nameSep p = p `sepBy` (string " and " <|> string ", and" <|> ", ")

main = do
  print $ parseOnly sepTestParser "This test and that test, this test particularly."

我希望输出为["This test", "that test", "this test particularly."]. 我有一种模糊的感觉，我正在做的事情是错误的，但我不能完全弄清楚为什么。

score 4 · Accepted Answer

^{注意：这个答案是用literate Haskell写的。将其另存为Example.lhs并加载到 GHCi 或类似文件中。}

问题是，sepBy实现为：

sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure []

这意味着第二个解析器s将在第一个解析器成功后调用。这也意味着，如果你要在字符类中添加空格，你最终会得到

["This test and that test","this test particularly"]

因为and现在可以由p. 这并不容易解决：您需要在点击空格后立即向前看，并检查在任意数量的空格之后是否出现“and”，如果是，则停止解析。只有这样编写的解析器sepBy才能工作。

因此，让我们编写一个解析器来代替单词（这个答案的其余部分是识字的 Haskell）：

> {-# LANGUAGE OverloadedStrings #-}
> import Control.Applicative
> import Data.Attoparsec.Text
> import qualified Data.Text as T
> import Control.Monad (mzero)

> word = takeWhile1 . inClass $ "-'a-zA-Z"
> 
> wordsP = fmap (T.intercalate " ") $ k `sepBy` many space
>   where k = do
>           a <- word
>           if (a == "and") then mzero
>                           else return a

wordsP现在需要多个单词，直到它碰到某个东西，那不是一个单词，或者一个等于“and”的单词。返回的mzero将指示解析失败，此时另一个解析器可以接管：

> andP = many space *> "and" *> many1 space *> pure()
> 
> limiter = choice [
>     "," *> andP,
>     "," *> many1 space *> pure (),
>     andP
>   ]

limiter与您已经编写的解析器基本相同，它与 regex 相同/,\s+and|,\s+|\s*and\s+/。

现在我们可以实际使用sepBy了，因为我们的第一个解析器不再与第二个重叠：

> test = "This test and that test, this test particular, and even that test"
>
> main = print $ parseOnly (wordsP `sepBy` limiter) test

结果["This test","that test","this test particular","even that test"]如我们所愿。请注意，此特定解析器不会保留空格。

因此，每当您想使用来创建解析器时sepBy，请确保两个解析器不重叠。

parsing - 在 Attoparsec 中使用 sepBy 字符串

1 回答 1

Related

Reference