Mirror, Mirror
By Randal L. Schwartz
A popular thing to do on the Web is to create a "mirror" -- a duplicate of a collection of Web files in another location, either for personal use, or to provide the information on a second Web server.
Many general-purpose solutions exist for mirroring entire Web hierarchies. Two that come immediately to mind are w3get (from the GNU project at www.gnu.org) and w3mir (found in the CPAN at www.perl.com/CPAN).
But these both require a lot of setup, and careful thinking -- necessary if you have a major project to duplicate, but what about when you're only mirroring a few files, or just one directory?
For example, let's look at a small task. The Internet Relay Chat (IRC) EF-NET channel called #perl is frequented by Perl hackers such as myself. Now, the channel is active 24 hours a day, but I can't stick around all the time. Luckily, there's an IRC bot that logs all the channel traffic into files that are accessible on the Web.
The files are all in one directory, and the names are all predictable (being constructed from the date), so there's no point in parsing the HTML returned from the directory just to find out the name of the latest file. I can compute that.
Don't Try This at Home
But if I fetched every file from the current log down to the oldest file every few hours, I would probably be banned from the server.