Improving Link Verification
By Randal L. Schwartz
One of the major hassles of maintaining a cool Web site is verifying that links are still valid. In my October 1996 column, I introduced hverify, a program I use to ensure the validity of outbound links. I've been tweaking the program recently, and this month I am happy to announce an updated version 2. Besides improved methods for parsing and following links, this update includes a full cross reference that shows link paths and anchor-line locations. I've found that the line number really helps in locating bad links.
Listing One presents the new, improved hverify. The first two lines turn on taint-checking, warnings, and compiler restrictions, while lines 4 through 6 pull in the LWP::UserAgent library (to allow me to fetch Web pages), the HTML::PARSEr library (to locate references), and the URI::URL library (to make relative links absolute, and vice versa).
Lines 10 through 21 define three configuration parameters that define the scope of the verification. Lines 10 and 11 give the list of top-level URLs that will be examined. Here, I've pointed the list to the top of my virtual Web server.
Parsing
Lines 12 through 16 define the subroutine PARSE, which will be repeatedly passed a URL and will return 1 if the URL should be fetched and examined for further Web links, or 0 if not.