Parsing XML in IE4 with JScript
By Michael Floyd
One bit of fallout from the Artificial Intelligence craze of the mid 1980s was accelerated research into natural-language processing. During that time period, I wrote several natural-language parsers that would take English syntax and convert it into commands that computers could understand. Most of the parsers were used in database applications and accepted English queries like "Give me all the dish on the 1995 Microsoft consent decree." The system would dutifully return all records related to the landmark case between Microsoft and the Department of Justice.
While it may seem like smoke and mirrors to some, such parsers are actually quite simple to write. They take a string (a sentence, in this case) and break it up into a list of tokens. The tokens are placed into a tree structure, which can be traversed by an application. This process is simplified in database applications by the fact that many tokens -- words like "and," "the," "me," and "dish" -- are unnecessary for the query and can be tossed out. Parsers are used for many types of applications such as compilers, interpreters, and other language processors, including browsers.
Last month I mentioned that several XML parsers are beginning to appear around the Net. Most of these parsers are written in C++ or Java, and will likely be used by programmers to create the next generation of XML tools and applications. They often incorporate a command-line interface, and in most cases are poorly documented. The good news is that you don't have to be a hard-core C++ or Java developer to use XML on your Web site.