It seems a bit odd to me: grouping all such literals under a single Integer
token, and then, in a parser rule you want to separate them again.
Why not just remove Integer
and do:
integer
: BINARY_LITERAL // when output=AST, this creates a CommonTree with type 'BINARY_LITERAL'
| HEX_LITERAL // ...
| DECIMAL_LITERAL
| OCTAL_LITERAL
;
BINARY_LITERAL
: '2#' ('0' | '1')+
;
HEX_LITERAL
: ('16#' | '0' ('x'|'X')) HEX_DIGIT+
;
DECIMAL_LITERAL
: ('0' | '1'..'9' DIGIT*)
;
OCTAL_LITERAL
: '8#' ('0'..'7')+
;
?
Or you could keep the Int(eger)
rule but set the numerical value of the various int-literals by doing:
Int
@init{int skip = 0, base = 10;}
: ( DECIMAL_LITERAL
| BINARY_LITERAL {base = 2; skip = 2;}
| OCTAL_LITERAL {base = 8; skip = 2;}
| HEX_LITERAL {base = 16; skip = $text.contains("#") ? 3 : 2;}
)
{
setText(String.valueOf(Integer.parseInt($text.substring(skip), base)));
}
;
fragment BINARY_LITERAL
: '2#' ('0' | '1')+
;
fragment HEX_LITERAL
: ('16#' | '0' ('x'|'X')) HEX_DIGIT+
;
fragment DECIMAL_LITERAL
: ('0' | '1'..'9' DIGIT*)
;
fragment OCTAL_LITERAL
: '8#' ('0'..'7')+
;
Be careful giving rules a name as some object/class/reserved-word of the target language can have (Integer
in case of Java).
EDIT
Okay. I'll leave my other answer there in case passers-by are wondering why on earth I'm proposing this... :)
Here's what (I think) you're after:
grammar T;
options {
output=AST;
}
tokens {
BinaryNode;
OctalNode;
HexNode;
DecimalNode;
}
parse
: integer+
;
integer
: i=Integer -> {$i.text.startsWith("2#")}? ^(BinaryNode Integer)
-> {$i.text.startsWith("8#")}? ^(OctalNode Integer)
-> {$i.text.matches("(16#|0[xX]).*")}? ^(HexNode Integer)
-> ^(DecimalNode Integer)
;
Integer
: DECIMAL_LITERAL
| BINARY_LITERAL
| HEX_LITERAL
| OCTAL_LITERAL
;
fragment BINARY_LITERAL
: '2#' ('0' | '1')+
;
fragment HEX_LITERAL
: ('16#' | '0' ('x'|'X')) HEX_DIGIT+
;
fragment HEX_DIGIT
: (DIGIT|'a'..'f'|'A'..'F')
;
fragment DECIMAL_LITERAL
: ('0' | '1'..'9' DIGIT*)
;
fragment OCTAL_LITERAL
: '8#' ('0'..'7')+
;
fragment DIGIT
: '0'..'9'
;
SPACE
: (' ' | '\t' | '\r' | '\n')+ {skip();}
;
Parsing the input "2#1111 8#77 0xff 16#ff 123"
will result in the following AST:
Since you've lost the information about what type of Integer
each literal is, you will have to do this check in the integer
-rule (the -> {boolean-expression}? ...
things after the rewrite rules).