Web Wandering for Broken Links
By Randal L. Schwartz
One of the problems in maintaining a good Web site is making sure there are good valid links to nifty places that may offer further relevant information, or perhaps just some nifty thing you've discovered. Discovering the links isn't very difficult, usually. After all, any of the big Web search engines or indexing services can probably give you more links on a given topic than you can visit in a lifetime.
The concern is that once you've copied that URL faithfully into your "hey, cool links here" page, things tend to move around, or even go away. Then you end up with a bad link. How do you discover this bad link? Well, you could spend a lot of time browsing your own pages, following all the links to verify that they're still good. Or, you could just sit back and wait for a visitor to email you, telling you that "this link is broken." (Be sure your email address is prominent on the page...I've visited too many pages with no apparent owners, and it's frustrating trying to report a bad link.)
However, you're reading this column, so I presume you'd like to hear about a simple tool I've written to follow these links automatically. With the easy-to-use LWP library (by Gisle Aas), you can write a program that fetches a page, looks for all its links, then tries each. In fact, this program, having noticed those links, can then also look at the content of those pages, looking for additional links, and so on. By recursively traversing the tree, you'll end up visiting everything possible.