parsing - Parser implementations comparison: correctness confirmation and (perhaps) optimization

Question

I've implemented two expression parsers, in recursive descent and operator precedence. They're implemented in Object Pascal. Here's the recursive descent:

function ParseFactor: PNode;
var
  Temp: PNode;
begin
  Result := ParsePrimary;
  if t.Kind in [tkDoubleAsterisks] then begin
    New(Temp);
    Temp^.Kind := nkPower;
    Temp^.LeftOperand := Result;
    // power operator is right associative
    Lexer.Get(t);
    Temp^.RightOperand := ParseFactor();
    Result := Temp;
  end;
end;

function ParseTerm: PNode;
var
  Temp: PNode;
begin
  Result := ParseFactor;
  while t.Kind in [tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan] do begin
    New(Temp);
    case t.Kind of
      tkAmpersand        : Temp^.Kind := nkAnd;
      tkAsterisk         : Temp^.Kind := nkMultiplication;
      tkSlash            : Temp^.Kind := nkDivision;
      tkDoubleLessThan   : Temp^.Kind := nkShiftLeft;
      tkDoubleGreaterThan: Temp^.Kind := nkShiftRight;
    end;
    Temp^.LeftOperand := Result;
    Lexer.Get(t);
    Temp^.RightOperand := ParseFactor;
    Result := Temp;
  end;
end;

function ParseExpression: PNode;
var
  Temp: PNode;
begin
  Result := ParseTerm;
  while t.Kind in [tkHorzBar,tkCaret,tkPlus,tkMinus] do begin
    New(Temp);
    case t.Kind of
      tkHorzBar: Temp^.Kind := nkOr;
      tkCaret  : Temp^.Kind := nkXor;
      tkPlus   : Temp^.Kind := nkAddition;
      tkMinus  : Temp^.Kind := nkSubstraction;
    end;
    Temp^.LeftOperand := Result;
    Lexer.Get(t);
    Temp^.RightOperand := ParseTerm;
    Result := Temp;
  end;
end;

and here's the operator precedence:

function GetTokenPrecedence(const Kind: TTokenKind): Integer; inline;
begin
  case Kind of
    tkHorzBar,tkCaret,tkPlus,tkMinus:
      Result := 1;
    tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan:
      Result := 2;
    tkDoubleAsterisks:
      Result := 3;
    else
      Result := -1;
  end;
end;

function IsRightAssociative(const Kind: TTokenKind): Boolean; inline;
begin
  Result := Kind in [tkDoubleAsterisks];
end;

function ParseBinaryRHSExpression(LHS: PNode; const CurrentPrecedence: Integer): PNode;
var
  Op: TTokenKind;
  RHS: PNode;
begin
  while GetTokenPrecedence(t.Kind) >= CurrentPrecedence do begin
    Op := t.Kind;
    Lexer.Get(t);
    RHS := ParsePrimary;
    while (GetTokenPrecedence(t.Kind) > GetTokenPrecedence(Op))
      or (IsRightAssociative(t.Kind) and (GetTokenPrecedence(t.Kind) >= GetTokenPrecedence(Op)))
    do begin
      RHS := ParseBinaryRHSExpression(RHS,GetTokenPrecedence(t.Kind));
    end;
    New(Result);
    case Op of
      tkHorzBar          : Result^.Kind := nkOr;
      tkCaret            : Result^.Kind := nkXor;
      tkPlus             : Result^.Kind := nkAddition;
      tkMinus            : Result^.Kind := nkSubstraction;
      tkAmpersand        : Result^.Kind := nkAnd;
      tkAsterisk         : Result^.Kind := nkMultiplication;
      tkSlash            : Result^.Kind := nkDivision;
      tkDoubleLessThan   : Result^.Kind := nkShiftLeft;
      tkDoubleGreaterThan: Result^.Kind := nkShiftRight;
      tkDoubleAsterisks  : Result^.Kind := nkPower;
    end;
    Result^.LeftOperand := LHS;
    Result^.RightOperand := RHS;
    LHS := Result;
  end;
  Result := LHS;
end;

function ParseExpression: PNode;
begin
  Result := ParsePrimary;
  if Assigned(Result) then begin
    Result := ParseBinaryRHSExpression(Result,0);
  end;
end;

Those are the only essential difference between the two. Some simple tests show that both seem to parse correctly. However, I'm not really sure about the operator precedence implementation because this is the first time I really implement it. And the surprising fact which bothers me, it runs slower than the recursive descent version (it takes 1.5 more times)! My compiler classes and all articles I read states that operator precedence parsing should be more efficient than recursive descent due to fewer function calls and that's what I'm expecting as well since the code seems to show that. I've inline-d additional functions to get the operator precedence and right-associativity testing but this doesn't seem to help much. Please tell me whether I did something wrong. Feel free to ask for clarity of certain things.

score 1 · Accepted Answer

与所有事物一样，权衡很重要。递归下降显式检查每个操作符标记，因此如果您有 N 个操作符，它本质上必须进行 N 次测试。运算符优先级只需要知道某事是运算符标记，并使用查找表。因此，它可以只使用几次查找，而不是 N 次测试。因此，如果您有很多运算符，则运算符优先级应该更快。如果您的语法只有几个，那么即使仔细调整也不清楚它是否会获胜。

从总体上看，解析器的速度可能并不重要。无论您正在构建使用解析器的任何工具，都会在解析器之外的其他地方花费更多的精力。

当机器很小时，运算符优先级是一个有趣的想法，因为可以在表中编码复杂的运算符层次结构。大多数情况下，它提供的折衷对于典型的工作台（甚至手持）来说并不重要。对于简单的语法，我会坚持使用递归下降，而对于其他任何东西，我都会坚持使用任何类型的解析器生成器。

parsing - Parser implementations comparison: correctness confirmation and (perhaps) optimization

1 回答 1

Related

Reference