Blekko, Google Line Up Hunted Spam Sites


Ready? Aim and fire. Blekko reported Tuesday banning more than 1.1 million spam-laden domains that support millions of pages from queries using the search engine's technology AdSpam, a machine-learning algorithm that examines pages for spam signals. The move blocks hundreds of millions of Web sites filled with ads and thin to no editorial content from serving up in its search engine results pages. Previously, the search engine had banned nearly two dozen sites such as eHow, owned by Demand Media.

A post on the Blekko site describes a new search algorithm that finds spam rather than ranking results. The algorithm is specifically designed to recognize spam-related pages and eliminate them before they ever appear in search results.

The engine will determine relevance and high-quality sites by reviewing the content -- deeming it amateurish or something of substance -- as well as the length of the article. The community also identifies important content with slashtags, creating them for searches that are deemed significant.



The clean-up is similar to the move Google made last week when it changed its algorithm so the sites filled with spam rank deeper in the queries. Not sure if you're aware, but some of the geeky cofounders and engineering team have roots at Google, California Institute of Technology (CalTech) or AOL. Take Tom Annau, vice president of engineering, for instance.

Slashtag this: Prior to Blekko, Annau spent four years at Google running projects in Web search and online advertising, from Map/ Reduce distributed computing infrastructure to applying massive-scale machine-learning models to search indexing, ad serving, spelling correction and document content analysis, according to his profile on the site.

It's interesting that the stakes of search engine ranking have become so important for marketing campaigns that Google, Blekko and others have had to change or develop algorithms to rid results of spam and sneaky black hat practices.

Google recently followed up with Panda, another algorithm change hitting low-quality content. It pushes down the rankings of sites that Google deems low-quality or spam. Since the algorithm has only rolled out in the U.S., the U.K. search marketing agency Greenlight is urging businesses to step up content audits to ensure that rankings don't slip. Greenlight believes Panda is a combination of increased emphasis on "user-click data and a revised document level classifier."

In an email, the company explains: "Google can track click through rates (CTRs) on natural search results easily. It can also track the length of time a user spends on a site, either by picking up users who immediately hit the back button and go back to the SERPs, or by collating data from the Google Toolbar or any third party toolbar that contains a PageRank meter. This collective in all probability provides enough data to draw conclusions about user behavior."

Not paying attention to detail could land a company deep in the rankings. First J.C. Penney, then, and now Hessam Lavi -- a SEO and Web analytics consultant, and former member of Google's search quality spam team -- point to the German fashion brand Hugo Boss. He explains that rather than ditch their Flash Web site so search engines can find content and rank their pages for relevant keywords, they have ventured into shady SEO.

Lavi calls it the "classic case of sneaky JavaScript redirection." Showing a screenshot of the code, he explains that these hidden doorways have links pointing to them in the page footers with a noscript tag.

At least it appears that Google is on the hunt for companies that are not adhering to terms and condition guidelines. What company will fall from the grace of first-page rankings next?

Next story loading loading..