parsing - 使用 Parsec 的 Haskell Parse Paragraph 和 em 元素

Question

我正在使用Text.ParserCombinators.Parsec和Text.XHtml来解析这样的输入：

this is the beginning of the paragraph --this is an emphasized text-- and this is the end\n

我的输出应该是：

<p>this is the beginning of the paragraph <em>this is an emphasized text</em> and this is the end\n</p>

此代码解析并返回一个强调的元素


em = do{ 
      ;count 2 (char '-') ;
      ;s <- manyTill anyChar (count 2 (char '-')) 
      ;return  (emphasize  << s)
     }

但我不知道如何获得带有强调项目的段落

有任何想法吗？

谢谢！！

score 1 · Accepted Answer

这是一个 hack，但我认为它可以满足您的要求：

list = (:[])
text = many (try em <|> (anyChar >>= return . list)) 
       >>= return . ("<p>"++) . (++"</p>") . concat

（每个未强调的字符都作为自己的字符串返回。）

以下是它的工作原理：

在每个字符处，首先尝试解析em. 这从两个破折号开始。由于em在使用单个破折号后可能会失败，如在“ab”中，您需要在它前面加上try. 如果在其余输入中不允许使用破折号，则不需要尝试，但情况可能并非如此。否则，使用 anyChar。但这是 type Char，不是String，所以它必须被包装在一个列表中。

这将返回一个单字符串列表，其中强调部分交错。但是你想要一个被p标签包围的字符串，所以你首先concat，然后将开始/结束标签添加到开始/结束。然后你返回那个值。

可能有一种方法可以重写整个解析器，以便在看到两个破折号之前使用输入而不是 anyChar。但我不知道如何把它写下来，所以你得到了这个黑客，这可能效率低得多。

parsing - 使用 Parsec 的 Haskell Parse Paragraph 和 em 元素

1 回答 1

Related

Reference