Suppose we have the input that looks like the sequence of simple English statements, each on a separate line, like these:
Alice checks
Bob bets 100
Charlie raises 100
Alice folds
Let's try parsing it with this grammar:
actions: action* EOF;
action: player=name (check | call | raise | fold) NEWLINE;
check: 'checks';
call: 'calls' amount;
raise: 'raises' amount;
fold: 'folds';
name: /* The subject of this question */;
amount: '$'? INT;
INT: ('0'..'9')+;
NEWLINE: '\r'? '\n';
The number of different verbs is fixed, but what's interesting is that name that we are trying to match could have spaces in it - and verbs could potentially be parts of it, too! So the following input is valid:
Guy who always bets 100 checks
Guy who always checks bets 100
Guy who always calls folds
Guy who always folds raises 100
Guy who always checks and then raises bets by others calls $100
So the question is: how do we define name
so it is greedy just enough to eat spaces and words that we are usually treating as verbs, but is not super-greedy so that the verbs could still be matched by action
rule?
My first attempt at solving this task was looking like this:
name: WORD (S WORD)*;
WORD: ('a'..'z'|'A'..'Z'|'0'..'9')+; // Yes, 1234 is a WORD, too...
S: ' '; // We have to keep spaces in names
Unfortunately, this will not match 'Guy who always bets', since bets
is not a WORD
, but a different token, defined by a literal in bets
rule. I wanted to get around that by creating a rule like keyword[String word]
, and making other rules match, say, keyword["bets"]
instead of a literal, but that's where I got stuck. (I guess I could just list all my verbs as valid alternates to be a part of a name
, but it just feels wrong.)
Here is what more: all the name
s are declared before they are used, so I can read them before I start parsing action
s. And they can't be longer than MAX_NAME_LENGTH chars long. Can it be of any help here?
Maybe I'm doing it wrong, anyway. ANTLR gurus, can I hear from you?