考虑这个正则表达式:
^foo/[^=]+/baz=(.*),[^,]*$
如果我运行它foo/bar/baz=one,two
,它匹配并且子组捕获one
。如果我运行它foo/bar/baz/bar/baz=three,four,five
,它匹配并且子组捕获three,four
。
我知道如何把它变成regex-applicative
解析器或ReadP
解析器:
import Text.Regex.Applicative
match (string "foo/" *> some (psym (/= '=')) *> string "/baz=" *> many anySym <* sym ',' <* many (psym (/= ','))) <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Just "one",Just "three,four"]
import Text.ParserCombinators.ReadP
readP_to_S (string "foo/" *> many1 (satisfy (/= '=')) *> string "/baz=" *> many get <* char ',' <* many (satisfy (/= ',')) <* eof) <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [[("one","")],[("three,four","")]]
这两个都按照我想要的方式工作。但是当我尝试将其直接音译成 Megaparsec 时,它变得很糟糕:
import Text.Megaparsec
parse (chunk "foo/" *> some (anySingleBut '=') *> chunk "/baz=" *> many anySingle <* single ',' <* many (anySingleBut ',') <* eof) "" <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Left (ParseErrorBundle {bundleErrors = TrivialError 11 (Just (Tokens ('=' :| "one,"))) (fromList [Tokens ('/' :| "baz=")]) :| [], bundlePosState = PosState {pstateInput = "foo/bar/baz=one,two", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}}),Left (ParseErrorBundle {bundleErrors = TrivialError 19 (Just (Tokens ('=' :| "thre"))) (fromList [Tokens ('/' :| "baz=")]) :| [], bundlePosState = PosState {pstateInput = "foo/bar/baz/bar/baz=three,four,five", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})]
我知道这源于 Megaparsec 默认情况下不回溯。我试图通过坚持try
在一堆不同的地方来解决这个问题,但我无法让它发挥作用。最终,我得到了这个怪物notFollowedBy
:
import Text.Megaparsec
parse (chunk "foo/" *> some (noneOf "=/" <|> try (single '/' <* notFollowedBy (chunk "baz="))) *> chunk "/baz=" *> many (try (anySingle <* notFollowedBy (many (anySingleBut ',') <* eof))) <* single ',' <* many (anySingleBut ',') <* eof) "" <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Right "one",Right "three,four"]
但这看起来像一团糟!特别是,我不喜欢我实际上必须两次指定大部分模式。从技术上讲,这不等于 regex ^foo/(?:[^=/]|/(?!baz=))+/baz=((?:.(?![^,]*$))*),[^,]*$
,而不是我最初的 regex 吗?必须有更好的方法来编写该解析器。我该怎么做?
编辑:我也尝试过这种方式,它也有效(不,它错误地接受foo//baz=,
):
import Text.Megaparsec
parse (chunk "foo/" *> (some . try $ many (noneOf "=/") *> single '/') *> chunk "baz=" *> ((++) <$> many (anySingleBut ',') <*> (concat <$> manyTill ((:) <$> single ',' <*> many (anySingleBut ',')) (try $ single ',' *> many (anySingleBut ',') *> eof)))) "" <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Right "one",Right "three,four"]
不过,它看起来一样混乱,这manyTill
意味着它不再真正映射到任何正则表达式。