Commentary

In Search, How Close Is Close Enough?

by Tom Hespos , June 13, 2006

In mathematics, an asymptote is an imaginary line to which a function or formula comes awfully close, but never quite reaches. I like to think of the search experience along the same lines--search results will never be perfectly relevant to a query, but they'll come very close.

If you remember what search was like before Google, you'll remember that keywords and meta tags were of huge importance. That is, the algorithms that determined the relevance of a particular page to a particular topic relied heavily on how many times the page mentioned a relevant keyword, and on the content of a page's meta and title tags.

These algorithms served us well for a while. By and large, search users found what they were looking for when they searched for things like product specifications or cookie recipes.

But then folks learned how to cheat the system. Page code could be loaded up with irrelevant keywords. Pretty soon, search relevance became something of a problem, as searches for products specs and cookie recipes turned up a bunch of irrelevant stuff, too, like solicitations from porn sites. Keyword-based algorithms were a decent surrogate for relevance for a while, but they eventually became outdated and didn't work well anymore.

When Google arrived on the scene, many of you may remember that it was like a breath of fresh air. One of the most significant things Google brought to the party was a new currency--inbound links. That is, if Page A linked to Page B, Google treated that link like a vote by Page A for Page B. As it turned out, this was a much better surrogate for relevance than keyword-based algorithms. It also had the rather desirable effect of democratizing the Web, treating inbound links as votes in a way, and giving hyperlinks an inherent value they never possessed before.

Just as keyword-based algorithms got us by for years, so have Google's. However, I'd suggest that the cracks in the armor of this approach have appeared. Just as folks learned to cheat the system in the first go-round, they've done it again--this time with some unintended consequences that pose a very real threat.

Google's mantra is "Do No Evil," and many of us like to think of them as a white-hat organization. For the most part, Google is. Consider this, though. When the U.S. government changes a tax law to help a certain economic class, that change often has unintended consequences for another economic class, and unintended consequences are precisely what we see with Google's creation of an inbound link-based currency.

Don't get me wrong. Google's creation of this link-based currency may be the best thing to ever happen to hyperlinks. Google may have done a great job of advancing search by democratizing the Web in this fashion, but what it doesn't do such a great job of is catching the ballot box stuffers out there. For many of us who maintain blogs, social networking sites, message boards and any other Web-based communities that allow users to create hyperlinks, the ballot box stuffers are the link spammers--folks who use scripts to generate thousands of machine-generated links for the purposes of artificially inflating relevance. And if you blog or run an online community, you're likely fighting a losing battle against the link spammers.

Simple content management systems gave rise to citizen publishing and the blogging movement. (Bloggers--would you still blog if you had to hand-code every page?) Sometimes we forget that the ease of publishing is what turns many of us into such prolific publishers. When we're spending more time filtering out link spam than we are publishing our material, it's not difficult to see problems on the horizon.

A recent article by Eric Ward on the MarketingProfs blog starts out by mentioning the proposals he's been getting lately from folks who want him to obtain thousands of inbound links for them. Some of you out there may have some visibility into the marketplace for machine-generated links. A thriving market such as this one is, on its face, an indication that the cheaters have once again caught up with the technology that makes up our surrogate for relevance.

You can see this market in action when you use popular blog search engines to find some of the latest posts in the blogosphere on a given topic. I used Google Blog Search to search for some information on Sprint the other day. The first page of results was composed almost entirely of splogs--blogs created solely for the purpose of generating bogus links.

We may never achieve perfect relevance, and it's okay if we don't. But it's clear that the relevance algorithms for search engines need to keep a few steps ahead of the cheaters if the search engine is to deliver anything of value to the searcher. Google should be able to tell the difference between a site with valuable content and a splog, just as our human brains can. And it should quickly and decisively de-list such spam from its index.

If it doesn't, bloggers, message board admins, and other folks who facilitate the conversation that keeps the Web thriving will spend more time filtering spam, and less time publishing and moderating discussions. When it takes more time to keep the channel clear and free of irrelevant noise than to publish content, these facilitators may wonder whether the effort is worth it.

Next story loading

About the Author

TOM HESPOS,