haskell - 如何使用 megaparsec 正确解析缩进块？

Question

我正在尝试制作一种基于缩进的编程语言，并且我正在尝试解析如下内容：

expr1 :
  expr2
  expr3

这里，本质上:表示一个新的缩进块的开始，所以expr1是完全不相关的，想法是:可以出现在行中的任何地方，并且必须是行的最后一个标记。

我得到了这个或多或少有效的代码：

block :: Parser Value
block = dbg "block" $ do
  void $ symbol ":"
  void $ eol
  space1
  (L.indentBlock spaceConsumer indentedBlock)
  where
    indentedBlock = do
      e <- expr
      pure (L.IndentMany Nothing (\exprs -> pure $ Block () (e : exprs)) expr)

但问题是，在示例中，只有块的第一个表达式被解析为适当的缩进，其他的必须更多缩进，像这样

expr1 :
  expr2
   expr3
   expr4
   expr5

score 1 · Accepted Answer

我最终expr1在与:

显然indentBlock从解析器作为最后一个参数传递的列开始计数，所以想法是从行首开始解析（相对于当前缩进级别），它最终是这样的：

block :: Parser Value
block =
  L.indentBlock spaceConsumer indentedBlock
  where
    indentedBlock = do
      caller <- callerExpression
      args <- parseApplicationArgs
      pure (L.IndentSome Nothing (exprsToAppBlock caller args) parse)
    exprsToAppBlock caller args exprs =
      pure (Application () caller (args <> [Block () exprs]))

score 1 · Accepted Answer

我不能提供 megaparsec 特定的建议，因为我不知道那个特定的库，但是我可以通过编写一些对缩进敏感的语言解析器来给你我的智慧：如果你在单独的步骤中进行 lex 和解析并添加indent_begin和indent_end期间，你的生活会容易得多词典分析。

score 0 · Accepted Answer

我通常添加以下组合子：

import qualified Text.Megaparsec.Char.Lexer as L

indented :: Pos -> Parser a -> Parser (Pos, a)
indented ref p = do pos <- L.indentGuard space GT ref 
                    v <- p
                    pure (pos, v)
        

aligned :: Pos -> Parser a -> Parser a
aligned ref p = L.indentGuard space EQ ref *> p

然后您可以使用L.indentLevel来获取参考缩进。

以下是解析包含错误处理的语句块的示例：

blocked1 :: Pos -> Parser a -> Parser [a]
blocked1 ref p = do (pos, a) <- indented ref p
                    rest <- many (try $ helper pos)
                    fpos <- getPosition
                    rest' <- traverse (reportErrors pos) rest
                    setPosition fpos
                    pure (a : rest')
    where helper pos' = do pos <- getPosition
                           a <- p
                           when (sourceColumn pos <= ref) $ L.incorrectIndent EQ pos' (sourceColumn pos)
                           pure (pos, a)
          reportErrors ref (pos, v) = setPosition pos *>
            if ref /= sourceColumn pos
               then L.incorrectIndent EQ ref (sourceColumn pos)
               else pure v
                
blocked :: Pos -> Parser a -> Parser [a]
blocked ref p = blocked1 ref p <|> pure []

block :: Pos -> Parser (Block ParserAst)
block ref = do
       s <- blocked1 ref stmt
       pure $ Block s


funcDef :: Parser (FuncDef ParserAst)
funcDef = annotate $
    do pos <- L.indentLevel 
       symbol "def"
       h <- header
       l <- localDefs 
       b <- block pos
       pure $ FuncDef h l b

haskell - 如何使用 megaparsec 正确解析缩进块？

3 回答 3

Related

Reference