html - Haskell - Parsec 解析
元素

Question

我正在使用Text.ParserCombinators.Parsec和Text.XHtml来解析这样的输入：

这是第一段示例\n
有两行\n
\n
这是第二段\n

我的输出应该是：

This is the first paragraph example\n with two lines\n And this is the second paragraph\n

我定义：


line= do{
        ;t<-manyTill (anyChar) newline
        ;return t
        }

paragraph = do{
        t<-many1 (line) 
        ;return ( p << t )
    }

但它返回：

This is the first paragraph example\n with two lines\n\n And this is the second paragraph\n

怎么了？有任何想法吗？

谢谢！

score 5 · Accepted Answer

从manyTill 的文档中，它运行第一个参数零次或多次，因此连续 2 个换行符仍然有效，并且您的line解析器不会失败。

您可能正在寻找类似many1Till(like many1vs many) 但它似乎不存在于 Parsec 库中的东西，因此您可能需要自己动手：（警告：我在这台机器上没有 ghc，所以这是完全未经测试）

many1Till p end = do
    first <- p
    rest  <- p `manyTill` end
    return (first : rest)

或更简洁的方式：

many1Till p end = liftM2 (:) p (p `manyTill` end)

score 2 · Accepted Answer

根据文档，manyTill组合器匹配其第一个参数的零次或多次line出现，因此将愉快地接受一个空白行，这意味着many1 line它将消耗文件中最后一个换行符之前的所有内容，而不是像看起来那样停在双换行符处你打算。

html - Haskell - Parsec 解析元素

2 回答 2

Related

Reference

html - Haskell - Parsec 解析
元素