I'm wondering if Scalas/Haskells parser combinators are sufficient for parsing a programming language. More specifically the language MiniJava. I'm currently reading compiller construction and jflex and java cup is quite painful to work with so I'm wondering if I could/should use parser combinators instead. The MiniJava syntax is very small. MiniJavas BNF: http://www.cambridge.org/us/features/052182060X/grammar.html
6 回答
I've never used Scala, but the existence of a definitive BNF makes this easy.
Trivially translated into Haskell's Text.ParserCombinators.Parsec:
goal = do c <- mainClass
cs <- many classDeclaration
eof
return $ c:cs
mainClass = do token "class"
name <- identifier
...
etc. The PArrows translation is pretty trivial too. You'll probably find it easier to have a distinct lexing phase before the parser, but you can do without too.
I'm using Scala's parser combinators to parse PL/SQL code, it works like a charm.
At least Parsec has built-in lexer for Java-like languages:
lexer = makeTokenParser javaStyle
You have to define the reserved words yourself.
Scala's parser is a backtracking parser, so it can deal with pretty much any BNF or EBNF. It also means, though, that there are edge cases where input can be painfully slow to be read.
If the grammar can be changed into an LL(1) grammar, you can use the ~! operator to keep backtracking to a minimum.
The grammar probably CAN be turned into LL(1), but, as written, it is not. See, for instance, that Expression and Statement have First/First conflicts (look this up at the end of the linked article).
Anyway, for an academic project, it is enough. For real life compiler stuff, you'll need faster parsers.
Programming in Scala (p. 647) says:
It [Scala's parser combinator framework] is much easier to understand and to adapt than a parser generator, and the difference in speed would often not matter in practice, unless you want to parse very large inputs.
As I would not classify source code as very large input (ideally), it should be sufficient.
I haven't dealt with the Scala or Haskell parser combinator libraries, but it looks like the grammar should be fine.