Commentary

What You Missed Me Saying If You Didn't Get To OMMA Metrics...

by Josh Chasin , Op-Ed Contributor, June 23, 2009

Since I began contributing to the Online Metrics Insider in September of 2007, I've written extensively about the fundamental differences between panel-based data, which provide behavioral tracking of the whole of Web usage from a sample of persons over time, and which may be projected to a particular user population; and site-centric server data, which provides a "census" of the behavior exhibited by machines, and specifically servers, at one Web entity. Site-centric server data is often thought to provide empirical, immutable measures of the visitation to a Web site, but in fact server data is simply a tally and classification of tagged or "beaconed" events, using the imperfect cookie technology to parse the behavior out into something called "Unique Visitors." But the notion of Unique Visitors in server data is illusory, and indeed the term is a misnomer; server data actually provides Unique Unduplicated Cookies, which do not correspond to persons, and as we well know, because of cookie deletion, typically far exceed the number of different persons actually visiting the Web entity.

But even the "page view" metric is far from straightforward in site-centric data. Those less technical among us (myself included) like to think of a page as something that exists fully formed on a server somewhere, and when I navigate to that page, my computer lets me get a look at it, like tuning a station on a TV. The page is "out there," I just have to point my machine at it.

But that isn't the case at all; a page view is a construct, rendered anew each time a server call is made. The server sends code to my machine that instructs the machine on what to render on my screen; and indeed sometimes multiple events on that "page" can be beaconed, triggering multiple server calls behind what I experience as a single page. At comScore, we've learned from data reconciliations with clients' server data that there typically isn't a one-to-one correspondence between the Internet user viewing a page, and the server data logging a page. Sometimes multiple beaconed events occur on a single page load; and sometimes beacons fire for events that do not actually constitute valid page views at all according to the comScore Media Metrix definitions, but still get tallied in server data.

One leading Web analytics provider even told us that, strictly speaking, there really isn't any such a thing as a "page view" in Web analytics; they just don't have anything better to call it.

But an intelligent application of server data and panel data together -- that may well be the holy grail of online metrics.

If you read about comScore's announcement introducing Media Metrix 360, you probably figured that this is where I was going today. But I know that you, dear reader, don't want to hear a commercial from me in this space. Rather, I want to talk a little about the application of server data to a panel-based audience measurement service, from the perspective of a media researcher.

All data are dumb on their own; we need organization, analysis and presentation n order to turn data into knowledge. So while I may have seemed critical of site-centric data above, such is not the case at all. WebAnalytics providers like Omniture, Webtrends, and Coremetrics take this data and use it to populate valuable reporting systems that allow you to track site performance and develop (and monitor) KPIs. I don't think these companies compete very much with one another on the basis of the tagged or beaconed data they collect, or on the performance of their tags; rather, I suspect most of the play is in the tools, the analytics, the interface.

Similarly, panel data in and of itself doesn't constitute audience measurement. These data must be collected, filtered, edited, and processed, with the end result populated into reporting tools for querying and analysis. The things that go on between data capture and data delivery are essential components of audience measurement and reporting. For example, at comScore (I can't speak for our competitors but would speculate that some have similar philosophies) we are, by definition, very particular in the crediting of traffic, because we are providing buyers and sellers with a consensus view of Webentity traffic that might potentially carry ad messages to Internet users. So we edit out things like back end calls, redirects, etc. that often count as traffic in site-centric data, but which are ineligible in our audience measurement reporting. This is part of the reason that ad agencies rely on services like Media Metrix; because we do, by design, endeavor to exclude server calls that do not constitute legitimate user-initiated traffic. (I wrote more about this here.) For example, if I navigate to a URL that redirects me to the site's home page, the advertiser wants that piece of navigation to register as one page view; yet often in site-centric data we see it registering as two page views (one for the redirecting URL and one for the home page.) Advertisers rely on audience measurement services to filter out those extraneous server calls.

This is important to specify because, when Media Metrix 360 rolls out, the panel-centric hybrid data will not suddenly mirror a Webentity's internal data. We will be using panel data combined with server data to create enhanced estimates of site visitation. You'll see and hear more on this soon, but basically, once we've filtered out international and non-human traffic, we'll be able to take what we know about how people in-country interact with Webentities (from the panel) and what we know about how machines in-country interact with the Webentity (from server data) and create measures of site visitation that are better than either a panel-based or a server-based projection alone.

One final point. Often, I hear a publisher announce with wonder, "We put up third-party tags, and in days, their numbers looked just like my server numbers!" This ought to surprise no one. While the implication may seem to be, "thus proving that that's what my audience is," in fact all this does is validate that both sets of tags have been similarly implemented. It indicates nothing about Web audience -- certainly not in the sense that advertising buyers and sellers are concerned with audience (which is to say, a person-based reach against content that might conceivably carry ads.). In our panel-centric hybrid technique, we too find that our beaconed data matches the site's internal data; this is the first phase of validation we undergo. But that isn't where our processes end; it is where they begin. Once we've validated the implementation of our beacons, only then do we begin the process of turning the data into ad agency-accepted audience measurement.

What we're hearing from many of our publisher clients is not that they want or expect audience measurement data to mirror their internal data; rather, that they want to understand, to reconcile, the causes for deviation and provide the metrics that advertisers and their agencies need to plan and design successful ad campaigns. They understand the pros and cons of both panel data and server data, and they appear to be quite energized right now about the prospect of truly integrating the two.

metrics

6 comments about "What You Missed Me Saying If You Didn't Get To OMMA Metrics...".

Check to receive email when comments are posted.

Roopa Saggar from Roche, June 23, 2009 at 4:18 p.m.
Excellent synopsis. Thank you!
Reply

Will Larson from Ticketmaster / Live Nation Entertainment, June 23, 2009 at 4:32 p.m.

Amen! While we certainly will not have 100% reliable site-centric data anytime soon, I do like that your publisher clients seek "to understand, to reconcile" the calculation strategy to actionably face their business objectives.

Jodi Mcdermott from comScore, June 23, 2009 at 4:33 p.m.

Josh,

Great article. I would be remiss as a Web Analyst if I did not comment on some of the statements that cross over into the Web Analytics arena in your article. Specifically this sentence:

"For example, if I navigate to a URL that redirects me to the site's home page, the advertiser wants that piece of navigation to register as one page view; yet often in site-centric data we see it registering as two page views (one for the redirecting URL and one for the home page.)"

If a web analytics tool is registering a redirect as a page view, then shame on the web analyst. A well trained WA (and the vendor) will have their dataset configured to filter out server calls by status code (no 301's and 302's) and have an updated IAB robot filter list (and algorithm) to remove known and suspected robotic traffic.

The second statement I want to comment on is:

"The things that go on between data capture and data delivery are essential components of audience measurement and reporting. "

Not that you implied that it didn't, but I think it needs to be explicitly stated that data capture, delivery and filtering are also critical in providing quality web analytics. In the configuration of a good web analytics implementation there is a strict process for going through and determining what is a valid HTTP request to include in the data set and then how to classify that specific request --> page view or some other event that should be available for analysis.

There are lot of similarities between audience measurement for the masses (your customers) and individual web analytics implementations that have been done properly. What I would love to hear in your next article (hint, hint) is how audience measurement firms go to the next step of taking the census level data they collect (or acquire) and then projecting out the final number. Perhaps that is the secret in the sauce, but I know that some of us would like to understand that aspect of the process better.

I am glad I had the front row seat at OMMA M&M. We'll miss you in San Francisco.

Jodi McDermott
Clearspring Technologies

John Grono from GAP Research, June 23, 2009 at 5:32 p.m.

As always Josh a concise and erdite posting. I just wish I could have been in the US for OMMA Metrics.

There is however, one teensy-weensy thing I would like to raise - and it has nothing todo with numbers, which is unlike me!

I refer to the use of the term 'WebAnalytics' for software like Omniture, Webtrends, Coremetrics etc. I'm sorry to get all semantical, but these software do not perform analytics on the web - they perform analytics on websites. We have simply truncated the word into the portmanteau word 'WebAnalytics', which unfortunately conveys a different meaning. Therefore, I'm suggesting that we begin referring to them by the more correct portmanteau term 'SiteAnalytics'.

We're starting to get traction with that term down here in Australia. Josh ... hop on board!

Neil Haynes from News International, June 24, 2009 at 6:25 a.m.

Interesting, but I'd like to re-iterate Jodi's excellent points as you unfairly (IMHO) raise some old myths about Web Analytics technology (such as counting 'hits' via log file analysis - who does that anymore?) while skipping past the limitations of panel data (you mention that Web Analytics can't get to a "one-to-one correspondence" between the user and the server, an unfortunate choice of words perhaps when panel-based systems have to extrapolate / estimate 'Visitors' themselves.)
I guess the debate between Web Analytics and panel-based measurement will rumble on, but let's make this debate frank, fair, and standards based (i.e. vendor agnostic) if we can. I'm sure at heart we all want to improve online measurement standards, whatever our favourite technology happens to be...
Thanks.

Joshua Chasin from KnotSimpler, June 24, 2009 at 11:13 a.m.

On my iPhone and this must limit commentary for now... Jodi is, as per usual, correct. I did not mean to suggest that WA in general double-counts re-directs; rather, that on occasion in working with specific clients, we've come across this. The bigger point I'm trying to make is that WA and audience measurement have different objectives and different masters, and this can both conform to best discipline practices yet support different dispositions.

John, re: "site analytics": I'm game if Jodi and the WAA are!