Transplanting Absolute and Relative Lengths
By Randal L. Schwartz
Someone once asked me for a tar of a portion of my Web site, in order to read it in on a local disk. At first, I said "no problem, I'll just launch tar right here." But then I realized that while the files would be intact, the links would be a mess. Like most people, I had three kinds of links: absolute; relative, pointing within the tree to be moved; and relative, pointing outside the part to be moved. I needed a file tree in which all internal links (links to other parts of this tree) were relative, so that it didn't matter where it went on my friend's server (or even as a tree of local files accessed with file: URLs). All links outside that tree had to be absolute, so that my friend's browser would just transparently pick up the Web path and run with it.
I started hacking out something with a few regular expressions, but quickly realized I was rebuilding the HTML::Parse module from the wonderful LWP package. I scuttled my earlier effort and decided to make a very powerful and robust program using that module as a base.
The trick with HTML::Parse was to construct a series of "callbacks." As that module is parsing an HTML file, it recognizes start and end tags, comments, and so on. Although I was interested only in the start tags (for the URLs in <A> and <IMG> attributes), I still needed to construct a "passthrough" callback for all the rest.