ruby - 在 Ruby 中使用 Parslet 的缩进敏感解析器？

Question

我正在尝试使用 Ruby 中的Parslet库来解析一个简单的缩进敏感语法。

以下是我尝试解析的语法示例：

level0child0
level0child1
  level1child0
  level1child1
    level2child0
  level1child2

生成的树如下所示：

[
  {
    :identifier => "level0child0",
    :children => []
  },
  {
    :identifier => "level0child1",
    :children => [
      {
        :identifier => "level1child0",
        :children => []
      },
      {
        :identifier => "level1child1",
        :children => [
          {
            :identifier => "level2child0",
            :children => []
          }
        ]
      },
      {
        :identifier => "level1child2",
        :children => []
      },
    ]
  }
]

我现在拥有的解析器可以解析嵌套级别 0 和 1 节点，但无法解析：

require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  rule(:indent) { str('  ') }
  rule(:newline) { str("\n") }
  rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }

  rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }

  rule(:document) { node.repeat }

  root :document

end

require 'ap'
require 'pp'

begin
  input = DATA.read

  puts '', '----- input ----------------------------------------------------------------------', ''
  ap input

  tree = IndentationSensitiveParser.new.parse(input)

  puts '', '----- tree -----------------------------------------------------------------------', ''
  ap tree

rescue IndentationSensitiveParser::ParseFailed => failure
  puts '', '----- error ----------------------------------------------------------------------', ''
  puts failure.cause.ascii_tree
end

__END__
user
  name
  age
recipe
  name
foo
bar

很明显，我需要一个动态计数器，它需要 3 个缩进节点来匹配嵌套级别 3 上的标识符。

如何以这种方式使用 Parslet 实现缩进敏感的语法解析器？可能吗？

score 14 · Accepted Answer

有几种方法。

通过将每一行识别为缩进和标识符的集合来解析文档，然后应用转换以根据缩进数量重建层次结构。
使用捕获来存储当前缩进并期望下一个节点包含该缩进加上更多以匹配作为一个孩子（我没有深入研究这种方法，因为我想到了下一个）
规则只是方法。所以你可以将'node'定义为一个方法，这意味着你可以传递参数！（如下）

这使您可以node(depth)根据node(depth+1). 然而，这种方法的问题在于该node方法与字符串不匹配，它会生成一个解析器。所以递归调用永远不会结束。

这就是dynamic存在的原因。它返回一个解析器，直到它尝试匹配它时才被解析，让你现在可以毫无问题地递归。

请参阅以下代码：

require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  def indent(depth)
    str('  '*depth)
  end

  rule(:newline) { str("\n") }

  rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }

  def node(depth) 
    indent(depth) >> 
    identifier >> 
    newline.maybe >> 
    (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
  end 

  rule(:document) { node(0).repeat }

  root :document
end

这是我最喜欢的解决方案。

score 0 · Accepted Answer

我不喜欢将缩进过程的知识编织到整个语法中的想法。我宁愿只生成其他规则可以使用的 INDENT 和 DEDENT 标记，类似于匹配“{”和“}”字符。所以以下是我的解决方案。它是IndentParser任何解析器都可以扩展以获取nl、生成indent和decent生成的标记的类。

require 'parslet'

# Atoms returned from a dynamic that aren't meant to match anything.
class AlwaysMatch < Parslet::Atoms::Base
  def try(source, context, consume_all)
    succ("")
  end
end
class NeverMatch < Parslet::Atoms::Base
  attr_accessor :msg
  def initialize(msg = "ignore")
    self.msg = msg
  end
  def try(source, context, consume_all)
    context.err(self, source, msg)
  end
end
class ErrorMatch < Parslet::Atoms::Base
  attr_accessor :msg
  def initialize(msg)
    self.msg = msg
  end
  def try(source, context, consume_all)
    context.err(self, source, msg)
  end
end

class IndentParser < Parslet::Parser

  ##
  # Indentation handling: when matching a newline we check the following indentation. If
  # that indicates an indent token or detent tokens (1+) then we stick these in a class
  # variable and the high-priority indent/dedent rules will match as long as these 
  # remain. The nl rule consumes the indentation itself.

  rule(:indent)  { dynamic {|s,c| 
    if @indent.nil?
      NeverMatch.new("Not an indent")
    else
      @indent = nil
      AlwaysMatch.new
    end
  }}
  rule(:dedent)  { dynamic {|s,c|
    if @dedents.nil? or @dedents.length == 0
      NeverMatch.new("Not a dedent")
    else
      @dedents.pop
      AlwaysMatch.new
    end
  }}

  def checkIndentation(source, ctx)
    # See if next line starts with indentation. If so, consume it and then process
    # whether it is an indent or some number of dedents.
    indent = ""
    while source.matches?(Regexp.new("[ \t]"))
      indent += source.consume(1).to_s #returns a Slice
    end

    if @indentStack.nil?
      @indentStack = [""]
    end

    currentInd = @indentStack[-1]
    return AlwaysMatch.new if currentInd == indent #no change, just match nl

    if indent.start_with?(currentInd)
      # Getting deeper
      @indentStack << indent
      @indent = indent #tells the indent rule to match one
      return AlwaysMatch.new
    else
      # Either some number of de-dents or an error

      # Find first match starting from back
      count = 0
      @indentStack.reverse.each do |level|
        break if indent == level #found it, 

        if level.start_with?(indent)
          # New indent is prefix, so we de-dented this level.
          count += 1
          next
        end

        # Not a match, not a valid prefix. So an error!
        return ErrorMatch.new("Mismatched indentation level")
      end

      @dedents = [] if @dedents.nil?
      count.times { @dedents << @indentStack.pop }
      return AlwaysMatch.new
    end
  end
  rule(:nl)         { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}

  rule(:unixnl)     { str("\n") }
  rule(:macnl)      { str("\r") }
  rule(:winnl)      { str("\r\n") }
  rule(:anynl)      { unixnl | macnl | winnl }

end

我确信可以改进很多，但这是我迄今为止提出的。

示例用法：

class MyParser < IndentParser
  rule(:colon)      { str(':') >> space? }

  rule(:space)      { match(' \t').repeat(1) }
  rule(:space?)     { space.maybe }

  rule(:number)     { match['0-9'].repeat(1).as(:num) >> space? }
  rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }

  rule(:block)      { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
  rule(:stmt)       { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
  rule(:testblock)  { identifier.as(:name) >> block }

  rule(:prgm)       { testblock >> nl.repeat }
  root :prgm
end

ruby - 在 Ruby 中使用 Parslet 的缩进敏感解析器？

2 回答 2

Related

Reference