Automating PDF Objects for Interactive Publishing
An Introduction to pj's Object Framework
Nassib Nassar
The Adobe Portable Document Format (PDF) has become very common for representing documents with rich textual and graphical layouts. Many Web pages are now augmented with PDF files that contain detailed technical specifications and fancy marketing brochures. PDF provides continuity between the traditional office and the Internet, allowing complex documents to be published in unaltered condition both on the Web and on paper. Using PDF it is possible to phase in a new information system to replace existing procedures that are based on printed forms. By merging accurate electronic versions of the original forms on the fly with data that are input through a standard Web browser, users can make the transition from a paper-based process to fully electronic document management gradually, seamlessly, and without loss of data integrity.
Surprisingly, there are few inexpensive software tools for manipulating PDF files, such as those available for processing HTML. Most of the existing tools are designed for interactive use and aren't well-suited for automated applications like dynamic document generation. This article discusses some of the issues related to understanding PDF documents. We explore the PDF format in the context of an extensible framework for modeling PDF data, rather than concentrating on parsing issues and ad hoc implementations. Finally, we introduce pj, a Java class library and object framework that can be used to make any Web site PDF-enabled.
In a Nutshell
The overall structure of a PDF file is straightforward.