magazine resources subscribe about advertising

New Architect Daily
Commentary and updates on current events and technologies

CMP Media E-Book

Download your copy today.

Research
Search for reports and white papers from industry vendors and analysts.

This Week at NewArchitect.com Subscribe now to our free email newsletter and get notified when the site is updated with new articles







Day of Defeat Online Gaming

 New Architect > Archives > 1999 > 12 > Programming with Perl

Search This Site

Back in my April 1997 column ("A Web-Search CGI"), I provided a simple script that searched the text of the programs I've written for this column over the years. Recently, I've been hacking my overall Web site design, and thought it would be cool to be able to search my entire site. The program of that column could do the trick, but only if I never planned on getting anything else done with my Web server box again, because it would be expensive to search everything.

But I thought to myself, hey, the big search engines have already come to my site, fetched all the pages I want to have searched, and indexed them for me. Furthermore, they have more spare CPU cycles than I have, and it'd be nice to take advantage of that.

And then I remembered that many of the search engines provide a way to instruct the returned values to have a specific URL or site value. I could use this to my advantage to create a wrapper that uses the big search engine to return hits only on my site!

The upside of this approach is that I leverage off existing work, and someone else's disk and CPU. The downside is that the spiders don't visit very often, so new material is likely to be missed in such an index. But for mostly static or old pages, the trade-off is often worth debating.

Of course, Perl can pass the proper values in to the search engine's form-response CGI programs, but the answer comes back as HTML. It's a mess to figure out which part of that HTML is a link to some hit, and which part is simply a link to an ad or something.<>




  Day of Defeat Online Gaming

home | daily | current issue | archives | features | critical decisions | case studies | expert opinion | reviews | access | industry events | newsletter | research | careers | info centers | advertising | subscribe | subscriber service | editorial calendar | press | contacts


Copyright © 2006 CMP Media, LLC Read our privacy policy, your California privacy rights, terms of service.
SDMG Web sites: BYTE.com, C/C++ Users Journal, Developer Pipeline, Dr. Dobb's Journal, DotNetJunkies, MSDN Magazine, Sys Admin,
SD Expo, SD Magazine, SqlJunkies, The Perl Journal, Unixreview, Windows Developer Network, New Architect

web2