I have to deal with lots of different file formats. At least 50, maybe more than 100.
I've played around with Antlr in the past. However, I'm not sure that Antlr would be suitable for this project for a couple of reasons:
- it's difficult to combine and reuse grammars and/or pieces of grammars
- Antlr does code generation -- making a change to an existing parser requires going back to Antlr, making the change, regenerating the code, integrating the code back into the codebase, and running the unit-tests
- doing tree-building/-processing requires dealing with another language inside Antlr -- a potential problem for future developers
Basically, I like Antlr, but I think that it may be better suited for creating one or two parsers for complex languages, rather than 100 parsers for somewhat simpler languages/formats.
An alternative to Antlr-like parser generators is parser combinators. The advantages are the parsers are directly integrated into code, making reuse, testing, and further abstraction very easy. Also, future developers wouldn't have to learn how to use a new tool. The downside of parser combinators is that I don't know of any heavy-duty libraries for using them in Java.
So the questions are:
- Is Antlr suitable/intended for such a massive parsing project?
- What are other options for large-scale parsing in Java?
Note: some of the file formats are CSV or tab-delimited, some are somewhat more complex, some are as complex as Java. Semantics-wise, they can also be quite complicated (although not all are).