The best kittens, technology, and video games blog in the world.

Tuesday, June 20, 2006

ANTLR


Everybody hates writing parsers, right ? Just admit it, you'd much rather write 1000 lines of cryptic regular expressions than a "real" parser (and if you don't know yet how to use regular expressions, drop everything you're doing and learn them right now).

Well, there's a good reason to hate writing parsers. If you write them by hand they're going to take a lot of time to write, be very hard to debug, and the error messages are going to suck. If you use some parser generator tool like lex+yacc, it's going to be slightly faster, slightly easier to debug, and the error messages are going to suck even more. All you're going to get is "Syntax error, go away". Oh, and that's assuming the grammar is actually LALR. If not, better just write the whole parser by hand.

Now here's a tool that makes parser writing suck a bit less (ANTLR). How does it differ from lex+yacc ? The most important thing, it is top-down parser, not a bottom-up one. And because it works from the top, it always know what was it expecting. So instead of "Syntax error, go away", you're getting error messages like: syntax error: unexpected symbol at line 2 (column 12): ";" for free. And you can place semantic actions pretty much everywhere. Unfortunately there's a price, grammars for ANTLR are slightly more difficult to write than for yacc. It also needs some serious getting used to and Python documentation is somewhat lacking compared to (great) Java docs.

ANTLR 2 is available for Java and Python. There's also an experimental version ANTLR 3, but it's for Java-only so far (a few ports are in progress). The Python version feel somewhat Java-ish. Apparently ANTLR 3 is supposed to be less Java-ish overall, like using actual arrays instead of silly linked lists in its default AST, but we'll have to wait for that one.

So maybe ANTLR doesn't make parser writing a particularly fun activity, but at least it sucks a lot less than lex+yacc and generates sane error messages for free. You may want to take a look at it the next time you have to parse something.

4 comments:

Anonymous said...

I highly reccommend you to investigate parser combinators like Parsec.

taw said...

Yeah, monadic parsers look kinda cool, but Haskell isn't really my thing.

Is it possible to use them from languages like Ruby or Python ? :-)

Martin DeMello said...

http://jparsec.codehaus.org/Ruby+Parsec

me22 said...

If you just add %error-verbose to your bison file, you get "syntax error, unexpected OPERATOR_ADD" for free, and adding line numbers is mostly solved by %locations ...