parsing - LL(1) 用堆栈实现的解析器：如何构建 AST？

Question

我目前正在手动构建解析器。它是一个 LL(1) 解析器。目前，它是一个很好的识别器：它的函数 parse(List tokens) 决定 tokens 是否是语言的成员。

现在，我想为该输入构建相应的 AST。但是，我知道如何以递归下降的方式实现它（已经做到了）。也就是说，对于挑战，我使用带有经典算法的堆栈来实现我的堆栈：

next <- first token of the input
stack <- START_SYMBOL
do {
    top <- stack.pop()
    if (top is a terminal and top == next) {
        next <- next token of the input
    } else if (top is a non terminal and PARSING_TABLE[top, next] exists) {
        stack.push(PARSING_TABLE[top, next]);
    } else {
         return invalid input;
    }
} while (stack is not empty);
return valid input;

其中 PARSING_TABLE 是 LL(1) 表。但是，我想知道如何在这样的配置中实现构建 AST 的部分。我不要求完整的实施，更多的是实施理念。

谢谢！

score 2 · Accepted Answer

您的堆栈可以进行注释，使其包含 AST 条目引用（即规则编号 + 规则中的位置 + 目标数据的存储位置）+（终端/非终端）

您initial stack <- START_SYMBOL被注释以将其结果存储在 AST 根中。

基本上，您pop()选择当前的 AST 构造。然后next <- next token将值保存在您的 AST 中。打开一个新的stack.push(PARSING_TABLE[top, next]);AST 列表并将其写入对应的构造中top，并在堆栈的每个条目中生成“规则编号 + 位置 + 目标列表”。

解析完成后，您就拥有了整棵树。

在精确的 AST 中，您可能希望忽略一些标记。这可以通过 push() 部分期间堆栈集中的适当注释来完成。典型的方法是为每个规则（A -> BC）附加一些元信息，例如，要保留什么以及结果的性质是什么。

score 1 · Accepted Answer

困难的出现是因为用匹配规则的 rhs 替换堆栈上的非终结符的常用方法有效地忘记了预测时的语法结构。但是要生成 AST，您需要稍后在完成规则解析时使用该结构。

与其将非终结符替换为其匹配规则的 rhs 符号，不如将其留在原处并将匹配的符号作为列表对象推送。这样堆栈就保留了语法的层次结构。

解析使用最顶部列表中的符号。列表的耗尽对应于规则的完成。非终结符在其规则完成时从堆栈中删除，而不是在预测时。

随着堆栈的消耗，构建一个必然的 AST 结构，该结构记住相关规则并存储已解析的标记。因此，堆栈就像一个预测的 AST，流入解析的 AST。

您可以将此视为模拟递归下降解析器的调用层次结构，其中符号列表堆栈作为调用帧堆栈。

在红宝石中：

# g is the grammar; a list of rules
# s is a terminal sequence to parse
# note, this code does not tokenize EOF

def parse(g, s)

 tab = gen_table(g)
 stack = [[g.start_sym]]
 # intermediate ast node format: [rule-action, symbols...]
 ast = [[->(rhs){[:_S, rhs[0]]}]]

 loop do
  puts "PARSE\n #{s}\n #{stack}\n #{ast}"

  if stack.first.empty?
   raise "extraneous input" if not s.empty?
   break
  end

  if stack.last.empty? # rule complete
   stack.pop
   node = ast.pop
   # transform the node (eg to a class) using the associated rule-action
   node = node.first.(node.drop(1))
   ast.last.push(node)
   stack.last.shift # rm sym from stack after completing it
   next
  end

  raise "incomplete input" if s.empty?
  tok = s.first
  topsym = stack.last.first

  if topsym.is_a? String # terminal
   raise "mismatch #{tok} != #{topsym}" if tok != topsym
   stack.last.shift
   ast.last.push(s.shift)

  elsif topsym.is_a? Symbol # nonterminal
   ri = tab[topsym][tok]
   raise "no rule for #{topsym}, #{tok}" if ri.nil?
   stack.push(g[ri].rhs.clone)
   ast.push([g[ri].action])
  end

 end

 node = ast.first
 node.first.(node.drop(1))
end

parsing - LL(1) 用堆栈实现的解析器：如何构建 AST？

2 回答 2

Related

Reference