A Tale of Two Parsers
By Kurt Cagle
XML is an open standard. Dealing with an open language places an interesting twist on how people approach XML. In a number of respects, the W3C functions as a legislative body that establishes the "laws" that determine XML. W3C members represent various interests: large corporations, universities, and an increasing number of watchdog groups. The debate among the participants can get rancorous, but that's not really surprising.
The role of the XML parser is the core of this struggle. IBM and Microsoft have taken different approaches to their versions of this key program; both tried to develop a tool useful to their product users and their own in-house development efforts. Knowing the strengths and weakness of these parsers will help you choose the right one for your programming needs.
A Primer on Parsers
A parser is a program that scans a selection of characters and generates some kind of a programmatic structure. Most program development tools use parsers either to generate machine code (as in a compiler), or to direct the operation of the program (as in an interpreter). The purpose of an XML parser is to convert the text input from an XML file into a set of hierarchical structures, and then to provide a convenient way to access that structure. Converting XML tags into nodes is a relatively simple task; a JavaScript function could do this conversion if efficiency weren't important. There are two different ways to build nodes: Build the elements one at a time, forgetting about the node once it's complete, or build a tree that can retain information about the structure over time.