The World's Information Desk
A Discussion with Google's Craig Silverstein
With an index of 1.3 billion documents that refreshes every 28 days, few companies can say they handle more data on the Web than Google. Web Techniques talked to Craig Silverstein, Google's director of technology, to learn how it's doneand where Google is going.
Web Techniques: If any task truly resembles looking for the proverbial needle in a haystack, searching the Web is it. How did Google's engineers approach that problem?
Craig Silvertein: We owe a huge debt to the large body of research in information retrieval that's been developed since the 1960s. But we added two elements that weren't yet in wide use: significant HTML analysisthat is, we looked not only at the text itself but also the markup used on the textand link analysis. The link analysis researchwhat became the PageRank algorithmis what really drove the new company.
WT: So how does PageRank work?
CS: It takes advantage of the fact that the Web has links. We can use the Web's link structure to get a quality score for every page on the Web. If a lot of high-PageRank pages point to your site, then your site also gets a high PageRank. PageRank wasn't developed for Web search, actually. But when Larry Page, the developer, started studying it, he discovered that the PageRank of a page corresponded closely to his intuitive idea of the quality or importance of a Web page. Intuitively, if Yahoo, the New York Times, and the maintainer of the most popular Barbie Doll site all link to your Web pageI won't try to guess what your Web page might be aboutthat reflects well on your page.