parsing - 使用 Parsec 解析数据并省略注释

Question

我正在尝试编写一个 Haksell Parsec Parser，它将文件中的输入数据解析为 LogLine 数据类型，如下所示：

--Final parser that holds the indvidual parsers.
final :: Parser [LogLine]
final = do{ logLines <- sepBy1 logLine eol
        ; return logLines
        }


--The logline token declaration
logLine :: Parser LogLine
logLine = do
name <-  plainValue -- parse the name (identifier)
many1 space -- parse and throw away a space
args1 <- bracketedValue -- parse the first arguments
many1 space -- throw away the second sapce
args2 <- bracketedValue -- parse the second list of arguments
many1 space -- 
constant <- plainValue -- parse the constant identifier
space
weighting <- plainValue --parse the weighting double
space
return $ LogLine name args1 args2 constant weighting

它可以很好地解析所有内容，但是现在我需要向文件中添加注释，并且我必须修改解析器以使其忽略它们。它应该支持仅以“--”开头并以“\n”结尾的单行注释我尝试如下定义注释标记：

comments :: Parser String
comments = do 
    string "--"
    comment <- (manyTill anyChar newline)
    return ""

然后将其插入final解析器，如下所示：

final :: Parser [LogLine]
final = do 
        optional comments
        logLines <- sepBy1 logLine (comments<|>newline)
        optional comments
        return logLines

它编译得很好，但它不解析。我尝试了一些小的修改，但最好的结果是将所有内容解析到第一个评论，所以我开始认为这不是这样做的方法。PS：我见过这个Similar Question，但它与我想要实现的目标略有不同。

score 4 · Accepted Answer

如果我正确理解您在评论中对格式的描述，您的格式示例将是

name arg1 arg2 c1 weight
-- comment goes here

可选地后跟进一步的日志行和/或评论。

那么你的问题是日志行和注释行之间有一个换行符，这意味着comments分隔符解析器的部分失败 -comments必须以 - 开头"--"- 没有消耗输入，所以newline尝试并成功。然后下一行以"--"which make plainValuefail 而没有消耗输入开始，从而结束sepBy1.

解决方案是让分隔符首先使用一个换行符，然后使用如下尽可能多的注释行：

final = do
    skipMany comments
    sepEndBy1 logLine (newline >> skipMany comments)

通过允许序列以分隔符（sepEndBy1而不是sepBy1）结束，最后的任何注释行都会LogLine自动跳过。

score 0 · Accepted Answer

我理解您的问题的方式是，每一行都是评论或日志行。如果是这样，final应该看起来像这样：

final :: Parser [LogLine]
final = do 
        logLines <- sepBy1 (comment<|>logLine) newline
        return logLines

parsing - 使用 Parsec 解析数据并省略注释

2 回答 2

Related

Reference