1

看来这是我今天的第 1000 个问题 :) 我已经接近完成我的语法了,如果前缀和中缀运算符共享相同的符号,我就会遇到问题。我正在解析一种称为 MathML 的标记语言...

grammar MathMLOperators;

options 
{
  output = AST;
  backtrack = true;
  memoize = true;
}

tokens
{
  DOCUMENT; // The root of the parsed document.
  GROUP;

  OP; // any operator
  PREFIX_OP; // a prefix operator.
  INFIX_OP; // an infix operator.
  POSTFIX_OP; // a postfix operator.
  NON_INFIX_OP; // a non-infix operator
}

// Start rule.
public document :  math+ -> ^(DOCUMENT math+);

inFixTag : TAG_START_OPEN MO  TAG_CLOSE ('-' | '+' | '=') TAG_END_OPEN MO TAG_CLOSE -> ^(INFIX_OP);

preFixTag : TAG_START_OPEN MO TAG_CLOSE ('+' | '-') TAG_END_OPEN MO TAG_CLOSE -> ^(PREFIX_OP);

// Use semantic predicate to only allow postfix expressions when at the end of an mrow.
postFixTag : TAG_START_OPEN MO TAG_CLOSE ('!' | '^') TAG_END_OPEN MO {input.LT(1).getType() == TAG_CLOSE && input.LT(2).getType() == TAG_END_OPEN && input.LT(3).getType() == MROW && input.LT(4).getType() == TAG_CLOSE}? TAG_CLOSE -> ^(POSTFIX_OP);

nonInfixTag : TAG_START_OPEN MO TAG_CLOSE ('!' | '^') TAG_END_OPEN MO TAG_CLOSE {$expressionList::count++;} -> ^(OP);

opTag: TAG_START_OPEN MO TAG_CLOSE  ('-' | '+' | '^' |'=')  TAG_END_OPEN MO TAG_CLOSE -> ^(NON_INFIX_OP);

//Expressions

infixExpression:  grouping (inFixTag^ grouping)*;
grouping : nestedExpression+ -> ^(GROUP nestedExpression+);

prefixExpression : /* check that it's the first in the mrow*/ {$expressionList::count == 0}? (preFixTag^ (primaryExpression | nonInfixTag)) {$expressionList::count++;};

postfixExpression : (primaryExpression | prefixExpression| nonInfixTag) (postFixTag^)? ;

expressionList scope {int count} @init{$expressionList::count = 0;} :  (infixExpression | opTag)+;

nestedExpression :  postfixExpression;

primaryExpression : mrow | mn;

math : TAG_START_OPEN root=MATH TAG_CLOSE expressionList TAG_END_OPEN MATH TAG_CLOSE -> ^($root expressionList);

mrow : TAG_START_OPEN root=MROW TAG_CLOSE expressionList? TAG_END_OPEN MROW TAG_CLOSE -> ^($root expressionList?);

mn: TAG_START_OPEN root=MN TAG_CLOSE INT TAG_END_OPEN MN TAG_CLOSE -> ^($root INT);

MATH : 'math'; // root tag
MROW : 'mrow'; // row
MO   : 'mo'; // operator
MN   : 'mn'; // number

TAG_START_OPEN : '<';
TAG_END_OPEN : '</' ;
TAG_CLOSE : '>';
TAG_EMPTY_CLOSE : '/>';

INT :   '0'..'9'+;

WS  :  (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

这将工作正常...

<math>
<mrow>
<mo>-</mo>
<mn>7</mn>
<mo>=</mo>
<mn>8</mn>
</mrow>
</math>

但这会失败...

<math>
<mrow>
<mo>-</mo>
<mn>7</mn>
<mo>-</mo>
<mn>8</mn>
</mrow>
</math>

第一个“-”应该是“前缀”,第二个应该是“中缀”。从调试器看来,该规则grouping正在循环并且没有返回到父规则infixExpression,即使它无法匹配。

我确定我在某个地方有一个错误的 EBNF 运算符,但我不知道是哪一个。我尝试遵循 C 等语言中的标准表达式嵌套模式,但这是一种不常见的解析语言。

4

1 回答 1

1

你介意把语法发给我吗...

这是我清理后的语法。

警告:我无法解释问题中的解析器的行为方式(因此我对其进行了清理),所以我不知道我无意中破坏/修复了原始内容 - 告诉我 AST 或解析器现在是错误的对我来说没有任何意义,因为对我来说,一开始它看起来就错了。;)

grammar MathMLOperators;

options 
{
  output = AST;
  backtrack = true;
  memoize = true;
}

tokens
{
  DOCUMENT; // The root of the parsed document.
  GROUP;
  OP; // any operator
  PREFIX_OP; // a prefix operator.
  INFIX_OP; // an infix operator.
  POSTFIX_OP; // a postfix operator.
  NON_INFIX_OP; // a non-infix operator
}

// Start rule.
public document :  math+ EOF -> ^(DOCUMENT math+);

inFixTag        : (op=MINUS | op=PLUS | op=EQ)  -> INFIX_OP[$op.text];
preFixTag       : (op=MINUS | op=PLUS)          -> PREFIX_OP[$op.text]; 

// Use semantic predicate to only allow postfix expressions when at the end of an mrow.
postFixTag      : (op=BANG | op=CARET) {input.LA(1) == CMROW}?      -> POSTFIX_OP[$op.text];
nonInfixTag     : (op=BANG | op=CARET) {$expressionList::count++;}  -> NON_INFIX_OP[$op.text];
opTag           : (op=MINUS | op=PLUS | op=CARET | op=EQ)           -> OP[$op.text];

//Expressions

infixExpression     : grouping (inFixTag^ grouping)*;
grouping            : nestedExpression+     -> ^(GROUP nestedExpression+);

prefixExpression    : /* check that it's the first in the mrow*/ 
                    {$expressionList::count == 0}? 
                        (preFixTag^ (primaryExpression | nonInfixTag)) 
                        {$expressionList::count++;}
                    ;

postfixExpression   : (primaryExpression | prefixExpression| nonInfixTag) (postFixTag^)? ;

expressionList scope {int count} @init{$expressionList::count = 0;} :  (infixExpression | opTag)+;

nestedExpression    :  postfixExpression;

primaryExpression   : mrow | NUM;

math    : MATH expressionList CMATH -> ^(MATH expressionList);

mrow    : MROW expressionList? CMROW -> ^(MROW expressionList?);

///////   LEXER   ///////

MATH    : TAG_START_OPEN WS* 'math' WS* TAG_CLOSE; // root tag
CMATH   : TAG_END_OPEN WS* 'math' WS* TAG_CLOSE;

MROW    : TAG_START_OPEN WS* 'mrow' WS* TAG_CLOSE; // row
CMROW   : TAG_END_OPEN WS* 'mrow' WS* TAG_CLOSE;

fragment OMO    : TAG_START_OPEN WS* 'mo' WS* TAG_CLOSE; // operator
fragment CMO    : TAG_END_OPEN WS* 'mo' WS* TAG_CLOSE; 

MINUS   : OMO '-' CMO {setText("-");};
PLUS    : OMO '+' CMO {setText("+");};
EQ      : OMO '=' CMO {setText("=");};
BANG    : OMO '!' CMO {setText("!");};
CARET   : OMO '^' CMO {setText("^");};


NUM     : TAG_START_OPEN WS* 'mn' WS* TAG_CLOSE 
            INT 
          TAG_END_OPEN WS* 'mn' WS* TAG_CLOSE 
          {setText($INT.text);}
        ;

fragment TAG_START_OPEN : '<';
fragment TAG_END_OPEN   : '</' ;
fragment TAG_CLOSE      : '>';
fragment TAG_EMPTY_CLOSE: '/>';

INT     :   '0'..'9'+;

WS      :  (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

测试用例 1:用 MROW 中缀 '='

输入:

    <math>
    <mrow>
    <mo>-</mo>
    <mn>7</mn>
    <mo>=</mo>
    <mn>8</mn>
    </mrow>
    </math>

输出:

中缀 = 带 mrow

测试用例 2:用 MROW 中缀“-”

输入:

    <math>
    <mrow>
    <mo>-</mo>
    <mn>7</mn>
    <mo>-</mo>
    <mn>8</mn>
    </mrow>
    </math>

输出:

中缀 - 带 mrow

测试用例 3:不带 MROW 的中缀“-”

输入:

    <math>
    <mo>-</mo>
    <mn>7</mn>
    <mo>-</mo>
    <mn>8</mn>
    </math>

输出:

中缀 - 没有 mrow

于 2012-12-22T02:14:12.087 回答