Deconstructing Search Engine Bias
Over my last two columns I've looked at activist-initiated campaigns to provide deeper insight into the way people use search engines, both in terms of the impact on searchers, and from the perspective of those who see search as an effective communications tool. While also considering the trust searchers place in their search engine, one additional insight I gained from talking with activists and reviewing their campaigns is the increased importance on search as editorial, for both natural and paid search. Considering both a searcher's trust, and that search results can also be viewed as editorial content, I thought it would be interesting to switch gears and explore the topic of search engine bias, particularly for the purpose of helping the average searcher better understand and think more critically about his or her search engine results.
For a thorough discussion of the major issues surrounding algorithmic bias, I highly recommend reading Eric Goldman's "Search Engine Bias and the End of Search Engine Utopianism," first published in the Yale Journal of Law and Technology in Spring 2006. In it he discusses why bias exists, why bias is desirable and necessary, how market forces limit the scope of bias, and how engines are accountable to "fickle searchers." Goldman also asserts that engines are beholden to "majority interests," and that PageRank's "non-egalitarian voting structure causes search results to be biased towards websites with economic power because these websites get lots of links due to their marketing expenditures and general prominence."
While I don't agree with the paper's end conclusion that personalization will eventually render bias moot, it does offer a very thorough and logical presentation on the premise and issues around search engine bias. In building on Goldman's description of PageRank as a type of bias, the following list highlights other additional elements of search bias (note that there are many more considerations than those listed here):
Anti-spam bias (real or perceived spam). If a site appears to be spam, as defined by the engine, then that site or offending document might not rank as well as it would otherwise, or could be permanently banned from the engine's index altogether. Meta refresh, and even the use of same-color text on same-color background are examples of tactics that have been previously used by spammers. Adapting these tactics might create a permanent bias against your site, even if first intentions were good, and the site is "legitimate" (as Google refers to sites in the patent document link below).
Big site / authority bias. Simply put, bigger sites with unique content, years of domain trust and a healthy backlink structure have a greater chance of getting a new page ranked across a wider variety of terms and phrases, as opposed to a much smaller site with fewer or no links, and a narrower-themed scope.
Blog / buzz bias. Blogs have hit prime time in Google Web search, and a blog with the previously mentioned characteristics can get ranked in minutes -- and sometimes even stay in position for months or longer.
Bold text bias. Bias is also shown in a SERP when a keyword or phrase matching a query is bolded or highlighted. Bolded text in the title, description and even the URL can make someone look, give them a reason to click, or give them a reason to bypass other non-bolded listings.
Domain bias. A trusted domain is given credence and higher visibility in the search engine results. Newer domains have to prove themselves by myriad factors. Google patent #20050071741 details many ways in which a "legitimate" domain may be considered in its algorithm (see claims 38-40). Be aware that just because it's written in the patent, doesn't necessarily mean that it is being used by the engine. Other details in this patent also offer many other possibilities of Google bias.
Feed and submission bias. Paid and free feeds now permeate the first page for certain results sets. Yahoo intersperses paid listings into its natural results (Search Submit Pro), and Google Base provides top Web listings for maps, product listings and more. To get in, you have to pay or submit directly for free.
Link bias. Links are the cornerstones of most popular search engine algorithms, and the difference between having a lot of quality links, or no links at all, is the different between being found, or not.
Image / video bias. As Hotchkiss's eyetracking research found, images visible above the fold can prompt someone to quickly scan to your asset over other text assets on the search results page.
Textual bias. As simple and obvious as it sounds, at this point in search history, results are heavily weighted toward text. Designing sites in Flash or other image-based elements can make your site fall victim to this bias, unless other considerations for text are made.
Paid search bias. Like it or not, the top search results page is biased towards paid search. This is a simple bias to overcome -- just break out your credit card.
Personalization bias. Personalization bias is when the search engine shows customized results based on a user's previous search history, sites visited, subscribed feeds, geographic or IP location, and other factors.
Hopefully this list illustrates that bias is often the reason we choose one search engine over another, but it doesn't negate the need to think critically about search results. If you have any additional thoughts or additions to the list, post them to the Search Insider blog.