Java Parsing Made Easy
By Al Williams
I've always been interested in writing compilers and interpreters. I once wrote a Basic compiler in Basic just to prove it could be done (not to mention win a bet involving a case of a refreshing cold beverage). At the heart of each of these language translation programs is parser code that analyzes input according to a specific set of rules, known as a grammar.
Grammar-based parsers aren't limited to programming languages. In fact, some of the most sophisticated parsers ever were the ones in the old text-based adventure games popular around the time the PC made its debut. Other parsers might handle mathematical equations, natural language input, or search engine queries.
So why would a Web developer want to parse input? One reason might be to create offline tools to generate or maintain programs or Web pages. Or you might want to handle incoming text from a Web form (such as a natural-language query). In either situation, a good parser is invaluable.
Tools of the Trade
At first glance, you might think that parsing a language is trivial: You just need to write several if statements that use String.equals to spot keywords. While you might be able to handle some simple grammars that way, however, complex ones require a lot more work. You might want to skip certain words, or deal with a word differently depending on which words follow it. Parsers can become so complex that people who write them often seem to speak their own peculiar jargon, expounding on LALR grammars, lexing, and recursive descent.<>