| ||||||||||||
They call themselves "The World's Only Open Internet Ratings Service." The big departure from history for Quantcast is that they are a "census-based" service rather than sample-based. What does this mean? It means that they are able to go much deeper in number of sites rated than the commercial services such as comScore and NetRatings. In fact, they "see" over 20 million sites as compared to the low 10s of thousands for other services. In the digital world where audiences may be small and we still want to slice and dice them for targeting, census-based sources may be the only way to go.
In a Google-like move, Quantcast at this point is a free service to all, both for publishers and for agency planners and buyers. More about that below.
Quantcast starts with a methodology similar to other rating services in the fielding of a panel with several million self selected respondents. They then use statistical techniques to project to the total U.S. That's when it gets interesting. They also have the capability to put their tags on publisher sites (with publisher permission) that can get much more granularity out of the information. They then combine the two of these methodologies and use their "secret sauce" called a Mass Inference algorithm to produce refined Web audience profiles. These algorithms have been ported over from the finance industry and refined by mathematicians recruited from NASA and other high-end organizations.
As I mentioned above, publishers must put Quantcast tags on their site to "Get Quantified" and ensure that the best possible information about their site is presented. In discussing this process with publishers, I have found that most find that their Quantcast audience to be higher than that reported by either comScore or NetRatings and closer to their log file reports. When publishers ask me if they should do this, my reaction is always, "Why wouldn't you do it?" It's free and Quantcast is going to report on the site anyway. So why not do everything they can do to make the data as representative of their true audience as possible?
Agency planners and buyers are using the tool, too. At Mediasmith, our planners have found it to be a quick way to determine if a site they are not familiar with skews towards a clients' target and is a candidate for further consideration. It also uses Domain Tools thumbnails to assist in getting a quick snapshot of the site. More powerful agency planning tools are on the way shortly. In time, they will be able to provide us information on those engaging with our ads and how those demos differ from the sites those consumers are on.
Their end game is not yet transparent. But it is clear that this is more than just a rating service. Quantcast's ability to measure targets puts them in the position of being a next generation Tacoda, with or without a network attached. They should be able to provide us with much more than behavioral information, given that they have an increasing amount of demographic and sociographic information on consumer web patterns.
To get to the big time, Quantcast needs to achieve greater distribution among top sites in their Quantified Site program. They are making great progress, though, with top sites like Newsweek, People.com, Digg and NBC already participating. They have an active business development program and are in conversation with many top sites.
So check it out today and keep an eye on their progress. This is a tool that will only have greater and greater utility as time goes on.



Scott Oliver’s concern about an approach that marries data from a “small number of sites that have tagged� with panel data to develop audience profiles is a valid one. The above figures clearly demonstrate that Quantcast’s direct-measurement universe is not “small� and that a significant, and rapidly growing number of publishers on the internet have adopted the Quantcast solution (including major media companies such as Time Inc., CBS, and IDG).
Scott also questioned – rightly so – the value of panels, especially if not randomly selected. So that confusion does not persist – Quantcast’s panel is not self selected. More importantly, we only use our panel as a reference point in a much more sophisticated inference model that combines directly-measured traffic information and panel data sources. Our panel is not directly linked to pixel data to generate demographic audience projections – this simply doesn’t work in a fragmented media environment.
John Grono identified one of the most important issues of the day, with his comments about cookie deletion. The extent that it impacts any given site’s audience numbers depends on a number of factors. The good news is that with the comprehensive view of the Internet we enjoy (via direct measurement), we can model its impact. Our Quantified Publishers already get detailed reporting on this factor (in addition to others such as work/home use) and shortly we will provide more details for everyone. With regards to traffic counts, we’re aiming to provide the marketplace with a clearer picture of cookie, machine, and people counts. Today’s tension is a direct function of the fact log-file data and audience data are based on two completely different metrics that we all refer to as “unique� – cookies, and people. They are like apples and oranges. As David pointed out (and John questioned) - our traffic data for Quantified Publishers does align more closely with publisher log file data. But that is because it is cookie-based data. There is no conspiracy! Stay tuned for the public launch of our enhanced traffic counts for a much clearer picture.
We welcome questions, debate, and input into our approach to audience measurement. It’s the only way to effectively innovate in a constantly changing environment and address the marketplace’s desire for accountability. At the end of the day, that is our goal: to innovate and deliver more accuracy and transparency to the market. It is not about “secret sauce� – it is about delivering better tools, services, and results to all sides of the marketplace. And we will live or die based on our ability to do just that.
- "panel of several million self selected respondents" - The mathematics of probabilty are valid when the sample is drawn randomly (everyone in the universe has an equal chance of being in the sample). Self selection does not afford a random sample - the sample must be recruited in a random way (random digit dialing, for example). Why doesn't Quantcast recruit their panel with a random method?
- Combining a non-random sample's profile data with clickstream data from a small number of sites that have tagged for Quantcast does not appear to be valid way to project to the online population - even given the "secret sauce". If Quantcast would recruit their panel in a random way, they could do away with the secret sauce.
Those of us who were engaged in web analytics during the 2000-2001 "dotcom bubble burst" remember that everyone standing around was dripping with the secret sauce of the bubble.
What better reason than "most find that their Quantcast audience to be higher than that reported by either comScore or NetRatings and closer to their log file reports" than to go with Quantcast ! All media owners clamour for the highest numbers ... it's only natural.
But if I may ask a question? Does Quantcast take into account cookie deletion over time? If they don't then that factor alone will account for this discrepancy. It then becomes a question of "higher number" or (in my opinion) "more accurate number".
If I may explain why I believe adjusting for cookie deletion provides a more accurate number. Down here in Australia we have a population of 21 million. Most research puts the 'on-line in the past month' audience at around 75%-80% of the population - i.e. the ceiling for the total audience at 16 to 17 million in round numbers.
Now if you aggregate the log files for the largest publishers (the top 6 account for the bulk of the traffic) and you de-duplicate the tags (yes others use site-tagging) on the aggregated file, guess how many Australians are on-line in a month. Around 35 to 37 million. Now call me silly but that is just plain wrong. When you take into account cookie deletion (and throw in a few other factors like dual-site access, multiple-person access per computer and so on) the estimate reduces to around 16.25 million. I wonder which is more reflective of the real world? Interestingly, sample only methods produce estimates or around 11 to 12 million - demonstrating their inability to penetrate the long tail.
Please note that this overstatement effect is greater the further up the food chain you go. That is, it is 200-250% at the total market level. It reduces to 50-100% at the major portal level, while at the small site level it is typically 10-20%.
Full disclosure: I consult to the Media Federation of Australia - the peak industry body for media agencies in Australia. In this capacity I have been permitted to "look under the hood" of numerous research methodologies. My conclusion is that sample-based estimates are wrong, but server-log based estimates that don't take into account cookie deletion are much wronger (sic) !!! The best way forward in my opinion is a hybrid approach that uses a panel to establish cookie deletion rates, multiple-site access, multiple-user per PC factors, demographic breakdowns then applies that data to tagged log files.
John Grono GAP Research Sydney Australia