A couple of columns ago I pondered aloud on the reasons why, in the face of the task of building a real-time search engine, Google should buy Twitter. I received more email, phone calls, and tweets on that column than many
others in recent memory, and I could tell very quickly that some of the paragraphs were widely interpreted. While most commenters seemed to have a positive outlook about a Twitter and Google merge,
many are skeptical of the search possibilities of Twitter, and others are just sick and tired of hearing about it. Others say Twitter is just a fad, and will die like many other digital upstarts in
the past without ever producing a dime.
Leaving the monetization argument aspect of Twitter on the side for the moment, there is still a strong case for an algorithmic approach to mining through Twitter data in order to produce a more useful result. When considering how Twitter can enhance results, we're not referring to simply running keyword matching feeds alongside mainline search results. That alone is not very useful, and would be prone to spam and present duplicate results, among many other problems. In fact, spam is Twitter's biggest enemy as a search engine. Since I wrote that last column, the problem seems to be turning into an avalanche, ranging from people entering giveaways via hash tags, to hash tag bombing, paid tweeting, hash tag spamming, and outright misdirection of links from the trusted people you follow.
Trust, authority, and relevancy
So in order for a Twitter search layer to be successful, it needs a good algorithm to ensure a few key ingredients that are well-known to the major search engines: Trust, authority and relevancy. What you get is a way to leverage the base of Twitter users -- primarily a high info-seeking and sharing crowd -- as a way to inform about "what is happening now." Trust, authority and relevancy are what the engines are good at, and by focusing on these aspects, spam is minimized to a great degree.
Just like the crawlers, human-driven social search engines will
live or die by the same principles. Ultimately, an algorithmic social search layer better serves the real-time mainstream search experience, as opposed to a standalone reference, which is why I made
the case for a major crawler and Twitter joining together. Google and other crawlers have the technology that Twitter needs, and Twitter has a massive network of info junkies. Neither would be able
to replicate what the other has in any reasonably short period of time.
A few ways that Twitter could borrow from the crawlers to increase trust, authority, and relevancy (and decrease spam)
Since the source of tweets as a measure of search relevance is human-driven (in fact, even algorithmic search is human-driven), we should expect the algorithm to focus very heavily on its users, and also the trust and authority of the sources to which they're being linked.
For fun, I have paralleled and adapted some of the well-known functions of algorithmic search to the Twitter network setting -- or, in other words, how a Google-, Yahoo-, or Bing-driven Twitter search algorithm might look if it starts acting more like a crawler engine:
Domain authority /username authority. A Twitter account username is a tertiary-level domain, and would function in social search much like a secondary-level Web site domain or tertiary subdomain (ex. site.com, sub.site.com). Age of account, number of inbound links to user address, and similar authority of followers could be used to calculate an authority score for the user (some of these authority recommendations will continue below). But the key is that it will take authority and trust at the domain/account level to make the data trustworthy enough to determine its relevance for real-time search. As a sign of the possible things to come for social search, Twitter has rolled out "verified accounts," validating the fact that reputation and trust does matter, even if it is the Wild West right now.
Duplicate content / re-tweeplicate content. While press releases and catalog product info can create a duplicate content issue for Google and other engines, "duplicate" retweets from trusted accounts in Twitter actually provide a strong indicator of relevance around links in network-search, which is the primary basis for One Riot's search algorithm now. Identifying the original source of a tweet that is greatly re-tweeted could also improve a trust score of a user, in the same way that other producers of original content get more authority and higher visibility than content syndicators in the crawler-based engines.
Blogging freshness / content freshness / microblogging freshness. Anyone who has spent five minutes doing SEO knows that well-maintained blogs with timely content get good traction in natural search, mainly because they are a source of freshly updated content. For that reason, a well-maintained Twitter account pointing to (and receiving links from) other authoritative sources of info will gain more trust as a social signal to a search engine.
Links / number of followers. In a way, the number of authoritative followers who are following another user could be a signal of authority and trust, much in the way that a single trusted link from NYTimes.com can be more valuable than 1,000 less-trusted links.
Inbound links vs. outbound links / ratio of followers in contrast to the number of people followed. When a user is highly followed (for example, John Mayer's Twitter ratio is 1,618,214:56) this means that people are listening to what he is saying, and they don't want anything else in return. A large
following is also a strong indicator that this is a "real" Twitter account, and not spam.
Linking to bad / good neighborhoods. A tried and true indicator of spam in search engine is the quality of site being linked to. Link to spammy ephemeral Viagra affiliate site = bad. Link to quality sites that host engaging or useful content = good.
Tweet themes of users. The theme of who someone is, or what they're into, is often apparent from viewing their streams. SEOs post a lot of SEO-related links; they talk about Google, Bing, Yahoo, search conferences, DAO, link development, etc.; the link to their homepage provides an even stronger signal of who they are. So as search engines create authority around certain types of site themes (for example, an authoritative automotive blog might not rank as well as an authoritative finance blog for finance-related terms), it would be a natural extension for a social-search algorithm to emphasize a themes around certain users.
The above list is not intended to suggest that these elements can't or won't be spammed, because they can and they will be. Whatever the outcome, the fact remains that Twitter search will provide a useful, new "in-the-moment" search element if trust, relevancy and authority are given full consideration. The many aspects of how we use it, and how we optimize for it, will be entirely different from what we do now.
If you've read this far, I invite you to continue the discussion with me below, and on Twitter at http://www.twitter.com/robgarner.
Editors' Note What do social media, online video, publishing and metrics have in common? Aside from all being topics that MediaPost publications such as Online Media Daily and OMMA magazine cover intently, they are all part of some fresh new OMMA conference videos that we've posted here for your viewing pleasure and professional development. Don't take our word for it. Come hear journalism savior Steve Brill make a case for online's "paid" model at OMMA Publish. Or listen to CNN interactive marketing guru Andy Mitchell explain how to build a community around news at OMMA Social. Or watch Publicis' Rishad Tobaccowala explain why everything can be measured, but "not everything is necessarily worth measuring" at OMMA Metrics & Measurement. Plus much, much more, including panels, keynotes, presentations, and even some good new insider perspectives from MediaPost's Search Insider and Email Insider invitation-only summits.