What An Algorithmic Approach To Twitter's Social-Search Layer Might Look Like

A couple of columns ago I pondered aloud on the reasons why, in the face of the task of building a real-time search engine, Google should buy Twitter. I received more email, phone calls, and tweets on that column than many others in recent memory, and I could tell very quickly that some of the paragraphs were widely interpreted. While most commenters seemed to have a positive outlook about a Twitter and Google merge, many are skeptical of the search possibilities of Twitter, and others are just sick and tired of hearing about it. Others say Twitter is just a fad, and will die like many other digital upstarts in the past without ever producing a dime.

Leaving the monetization argument aspect of Twitter on the side for the moment, there is still a strong case for an algorithmic approach to mining through Twitter data in order to produce a more useful result. When considering how Twitter can enhance results, we're not referring to simply running keyword matching feeds alongside mainline search results. That alone is not very useful, and would be prone to spam and present duplicate results, among many other problems. In fact, spam is Twitter's biggest enemy as a search engine. Since I wrote that last column, the problem seems to be turning into an avalanche, ranging from people entering giveaways via hash tags, to hash tag bombing, paid tweeting, hash tag spamming, and outright misdirection of links from the trusted people you follow.

Trust, authority, and relevancy

So in order for a Twitter search layer to be successful, it needs a good algorithm to ensure a few key ingredients that are well-known to the major search engines: Trust, authority and relevancy. What you get is a way to leverage the base of Twitter users -- primarily a high info-seeking and sharing crowd -- as a way to inform about "what is happening now." Trust, authority and relevancy are what the engines are good at, and by focusing on these aspects, spam is minimized to a great degree.



Just like the crawlers, human-driven social search engines will live or die by the same principles. Ultimately, an algorithmic social search layer better serves the real-time mainstream search experience, as opposed to a standalone reference, which is why I made the case for a major crawler and Twitter joining together. Google and other crawlers have the technology that Twitter needs, and Twitter has a massive network of info junkies. Neither would be able to replicate what the other has in any reasonably short period of time.

A few ways that Twitter could borrow from the crawlers to increase trust, authority, and relevancy (and decrease spam)

Since the source of tweets as a measure of search relevance is human-driven (in fact, even algorithmic search is human-driven), we should expect the algorithm to focus very heavily on its users, and also the trust and authority of the sources to which they're being linked.

For fun, I have paralleled and adapted some of the well-known functions of algorithmic search to the Twitter network setting -- or, in other words, how a Google-, Yahoo-, or Bing-driven Twitter search algorithm might look if it starts acting more like a crawler engine:

Domain authority /username authority. A Twitter account username is a tertiary-level domain, and would function in social search much like a secondary-level Web site domain or tertiary subdomain (ex., Age of account, number of inbound links to user address, and similar authority of followers could be used to calculate an authority score for the user (some of these authority recommendations will continue below). But the key is that it will take authority and trust at the domain/account level to make the data trustworthy enough to determine its relevance for real-time search. As a sign of the possible things to come for social search, Twitter has rolled out "verified accounts," validating the fact that reputation and trust does matter, even if it is the Wild West right now.

Duplicate content / re-tweeplicate content. While press releases and catalog product info can create a duplicate content issue for Google and other engines, "duplicate" retweets from trusted accounts in Twitter actually provide a strong indicator of relevance around links in network-search, which is the primary basis for One Riot's search algorithm now. Identifying the original source of a tweet that is greatly re-tweeted could also improve a trust score of a user, in the same way that other producers of original content get more authority and higher visibility than content syndicators in the crawler-based engines.

Blogging freshness / content freshness / microblogging freshness. Anyone who has spent five minutes doing SEO knows that well-maintained blogs with timely content get good traction in natural search, mainly because they are a source of freshly updated content. For that reason, a well-maintained Twitter account pointing to (and receiving links from) other authoritative sources of info will gain more trust as a social signal to a search engine.

Links / number of followers. In a way, the number of authoritative followers who are following another user could be a signal of authority and trust, much in the way that a single trusted link from can be more valuable than 1,000 less-trusted links.

Inbound links vs. outbound links / ratio of followers in contrast to the number of people followed. When a user is highly followed (for example, John Mayer's Twitter ratio is 1,618,214:56) this means that people are listening to what he is saying, and they don't want anything else in return. A large following is also a strong indicator that this is a "real" Twitter account, and not spam.

Linking to bad / good neighborhoods. A tried and true indicator of spam in search engine is the quality of site being linked to. Link to spammy ephemeral Viagra affiliate site = bad. Link to quality sites that host engaging or useful content = good.

Tweet themes of users. The theme of who someone is, or what they're into, is often apparent from viewing their streams. SEOs post a lot of SEO-related links; they talk about Google, Bing, Yahoo, search conferences, DAO, link development, etc.; the link to their homepage provides an even stronger signal of who they are. So as search engines create authority around certain types of site themes (for example, an authoritative automotive blog might not rank as well as an authoritative finance blog for finance-related terms), it would be a natural extension for a social-search algorithm to emphasize a themes around certain users.

The above list is not intended to suggest that these elements can't or won't be spammed, because they can and they will be. Whatever the outcome, the fact remains that Twitter search will provide a useful, new "in-the-moment" search element if trust, relevancy and authority are given full consideration. The many aspects of how we use it, and how we optimize for it, will be entirely different from what we do now.

If you've read this far, I invite you to continue the discussion with me below, and on Twitter at


Editors' Note What do social media, online video, publishing and metrics have in common? Aside from all being topics that MediaPost publications such as Online Media Daily and OMMA magazine cover intently, they are all part of some fresh new OMMA conference videos that we've posted here for your viewing pleasure and professional development. Don't take our word for it. Come hear journalism savior Steve Brill make a case for online's "paid" model at OMMA Publish. Or listen to CNN interactive marketing guru Andy Mitchell explain how to build a community around news at OMMA Social. Or watch Publicis' Rishad Tobaccowala explain why everything can be measured, but "not everything is necessarily worth measuring" at OMMA Metrics & Measurement. Plus much, much more, including panels, keynotes, presentations, and even some good new insider perspectives from MediaPost's Search Insider and Email Insider invitation-only summits.

6 comments about "What An Algorithmic Approach To Twitter's Social-Search Layer Might Look Like".
Check to receive email when comments are posted.
  1. Michael Kilgore from Tampa Bay Performing Arts Center, July 8, 2009 at 10:52 a.m.

    This story on Bing and Twitter searches ran earlier this week in The New York Times. I don't recall whether you referenced this in the original column, so fyi.

  2. Lisa Young from Outrider, July 8, 2009 at 1:54 p.m.


    I agree completely that real-time search cannot succeed without trust and improved relevancy factors. Google's new Options can produce results with recency but I'd question whether it produces authoritative, relevant sources.
    In my post today on SearchFuel I hoped to illustrate that credible trusted brands lose search visibility when you filter for recency. I believe that just as much as "real-time" search needs to look at the signals you have outlined ... those trusted sources need to step up and embrace the channels of real-time search. Your web presence can no longer be one-dimensional and limited to a corporate or brand site.

  3. Rob Garner from Author of "Search and Social: The Definitive Guide to Real-Time Content Marketing Wiley/Sybex 2013, July 8, 2009 at 2:38 p.m.

    Thanks Michael and Lisa for your comments.

    Michael - yes, I was aware of the Bing implementation, though it appears to be very limited at the moment, and they seem to be determining trust and authority manually.

    Lisa - based on the searches I have performed in Google options (hundreds, if not a thousand plus, over many months) I would generally agree. It seems a lot of that has to do with their approach to content strategy, and frequency as a publisher. Agreed, overall, there is going to be a segment of searches that are not provided with authoritative searches, as certain topics simply aren't as applicable to "what is happening now". Thanks

  4. Rob Griffin from Almighty, July 8, 2009 at 3:16 p.m.

    In addition to a good algo to make better use of this data their needs to be a big filter. Not just for spam, but for none useful posts. Im specifically pointing fingers at the people that listen too literally to the "What are you doing?" request. Most of the valuable comments, tweets if I must use that term, are not people saying hi or telling me it's Wednesday already, but people with real good insights to share. Id love for algo search here that will filter out the personal status updates and allow me to sort real insight and opinion in real time.

  5. Rob Garner from Author of "Search and Social: The Definitive Guide to Real-Time Content Marketing Wiley/Sybex 2013, July 8, 2009 at 3:46 p.m.

    Rob - totally agree. It would be great if people were able to label and partition tweets between "personal" and "thought leadership" for example, along with the ability to follow a segment of someone's tweets, and skip the mundane stuff if they want to.

  6. Gregory Martin from TipTop Technologies, July 9, 2009 at 10:37 a.m.

    Thanks for the article. Great points Rob.
    Using parallel criteria that traditional search engines employ seem applicable to social search engines, but when you are dealing with stream of consciousness versus archived information retrieval there are a lot of moving parts. I think you have done well to identify what they are, but I'd like to think that users are ready to move away from traditional search results towards results that have embedded value. By grouping search results into Tips (content with positive to-do statements), Tips (conent with negative don't do statements), and Remaining messages (messages with neutral or factual content) TipTop search provides both consumer and enterprise users with real life choice and decision making value. Capturing the experiences and opinions of users and being able to respond and share these things across a platform like Twitter is amazing. Being able to utilize and share this content through search results in a relevant way is what achieves.

    TipTop's algorithmic social search engine, utilizes a unique universal natural language processing technology to delivers high quality messages and content with minimal junk or spam. The messages are weighted so quality posts rise to the top through a combination of temporal and semantic relevancy ranking.

Next story loading loading..