I am currently working on a poker hand history parser as a part of my bachelor project. I've been doing some research past couple of days, and came across a few nice parser generators (of which I chose JavaCC, since the project itself will be coded in Java).
Despite the hand history grammar being pretty basic and straightforward, there's an ambiguity problem due to allowed set of characters in player's nickname.
Suppose we have a line in a following format:
Seat 5: myNickname (1500 in chips)
Token myNickname
can contain any character as well as white spaces. This means, that both (1500 in chip
and Seat 5:
are valid nicknames - which ultimately leads to an ambiguity problem. There are no restrictions on player's nickname except for length (4-12 characters).
I need to parse and store several data along with player's nickname (e.g. seat position and amount of chips in this particular case), so my question is, what are my options here?
I would love to do it using JavaCC, something along this:
SeatRecord seat() :
{ Token seatPos, nickname, chipStack; }
{
"Seat" seatPos=<INTEGER> ":" nickname=<NICKNAME> "(" chipStack=<INTEGER>
"in chips)"
{
return new SeatRecord(seatPos.image, nickname.image, chipStack.image);
}
}
Which right now doesn't work (due to the mentioned problem)
I also searched around for GLR parsers (which apparently handle ambigious grammars) - but they mostly seem to be abandoned or poorly documented, except for Bison, but that one doesn't support GLR parsers for Java, and might be too complex to work with anway (aside for the ambiguity problem, the grammar itself is pretty basic, as I mentioned)
Or should I stick to tokenizing the string myself, and use indexOf(), lastIndexOf()
etc. to parse the data I need? I would go for it only if it was the only option remaining, since it would be too ugly IMHO and I might miss some cases (which would lead to incorrect parsing)