1

我有一个使用一些片段的词法规则(整数)。在解析器规则(解析)中,我想根据生成相关令牌的片段以不同方式重写我的树。我做了一个小语法来展示我正在尝试的内容:

grammar subrange;

options {
    output=AST;
}

tokens {
    NumberNode;
    DecimalNode;
    BinaryNode;
    HexNode;
    OctalNode;
}

parse
    : Integer+ -> ^(NumberNode Integer)+
    ;

Integer
    : DECIMAL_LITERAL
    | BINARY_LITERAL
    | HEX_LITERAL
    | OCTAL_LITERAL
    ;

fragment BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

fragment HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

fragment HEX_DIGIT
    : (DIGIT|'a'..'f'|'A'..'F')
    ;

fragment DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

fragment OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

fragment DIGIT
    : '0'..'9'
    ;

SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};

我希望解析规则在虚构的 DecimalNode 下重写 DECIMAL_LITERAL,但在 BinaryNode 下重写 BINARY_LITERAL(而不是在 NumberNode 下的所有内容)。

我试图通过更改词法规则中的标记类型来做到这一点,以便我可以在解析规则中相应地重写。

我想我应该可以通过一个动作来做到这一点,但我一直无法弄清楚如何找到返回的令牌以更改其类型。http://www.antlr.org/wiki/display/ANTLR3/Special+symbols+in+actions似乎表明 $tokenref 应该可以工作,但它根本没有被翻译。

还是有另一种方法可以做到这一点?

提前致谢。

4

1 回答 1

2

It seems a bit odd to me: grouping all such literals under a single Integer token, and then, in a parser rule you want to separate them again.

Why not just remove Integer and do:

integer
    : BINARY_LITERAL // when output=AST, this creates a CommonTree with type 'BINARY_LITERAL'
    | HEX_LITERAL    // ...
    | DECIMAL_LITERAL
    | OCTAL_LITERAL 
    ;

BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

?

Or you could keep the Int(eger) rule but set the numerical value of the various int-literals by doing:

Int
@init{int skip = 0, base = 10;}
    : ( DECIMAL_LITERAL
      | BINARY_LITERAL  {base = 2;  skip = 2;} 
      | OCTAL_LITERAL   {base = 8;  skip = 2;} 
      | HEX_LITERAL     {base = 16; skip = $text.contains("#") ? 3 : 2;} 
      )
      {
        setText(String.valueOf(Integer.parseInt($text.substring(skip), base)));
      }
    ;

fragment BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

fragment HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

fragment DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

fragment OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

Be careful giving rules a name as some object/class/reserved-word of the target language can have (Integer in case of Java).


EDIT

Okay. I'll leave my other answer there in case passers-by are wondering why on earth I'm proposing this... :)

Here's what (I think) you're after:

grammar T;

options {
  output=AST;
}

tokens {
  BinaryNode;
  OctalNode;
  HexNode;
  DecimalNode;
}

parse
 : integer+
 ;

integer
 : i=Integer -> {$i.text.startsWith("2#")}?         ^(BinaryNode Integer)
             -> {$i.text.startsWith("8#")}?         ^(OctalNode Integer)
             -> {$i.text.matches("(16#|0[xX]).*")}? ^(HexNode Integer)
             ->                                     ^(DecimalNode Integer)
 ;

Integer
 : DECIMAL_LITERAL
 | BINARY_LITERAL
 | HEX_LITERAL
 | OCTAL_LITERAL
 ;

fragment BINARY_LITERAL
 : '2#' ('0' | '1')+
 ;

fragment HEX_LITERAL 
 : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
 ;

fragment HEX_DIGIT
 : (DIGIT|'a'..'f'|'A'..'F')
 ;

fragment DECIMAL_LITERAL 
 : ('0' | '1'..'9' DIGIT*)
 ;

fragment OCTAL_LITERAL 
 : '8#' ('0'..'7')+
 ;

fragment DIGIT
 : '0'..'9'
 ;

SPACE 
 : (' ' | '\t' | '\r' | '\n')+ {skip();}
 ;

Parsing the input "2#1111 8#77 0xff 16#ff 123" will result in the following AST:

enter image description here

Since you've lost the information about what type of Integer each literal is, you will have to do this check in the integer-rule (the -> {boolean-expression}? ... things after the rewrite rules).

于 2012-07-18T16:09:38.823 回答