Since I began contributing to the Online Metrics Insider in September of 2007, I've written extensively about the fundamental differences between panel-based data, which provide behavioral tracking
of the whole of Web usage from a sample of persons over time, and which may be projected to a particular user population; and site-centric server data, which provides a "census" of the behavior
exhibited by machines, and specifically servers, at one Web entity. Site-centric server data is often thought to provide empirical, immutable measures of the visitation to a Web site, but in
fact server data is simply a tally and classification of tagged or "beaconed" events, using the imperfect cookie technology to parse the behavior out into something called "Unique Visitors." But
the notion of Unique Visitors in server data is illusory, and indeed the term is a misnomer; server data actually provides Unique Unduplicated Cookies, which do not correspond to persons, and as we
well know, because of cookie deletion, typically far exceed the number of different persons actually visiting the Web entity.
But even the "page view" metric is far from straightforward
in site-centric data. Those less technical among us (myself included) like to think of a page as something that exists fully formed on a server somewhere, and when I navigate to that page, my
computer lets me get a look at it, like tuning a station on a TV. The page is "out there," I just have to point my machine at it.
But that isn't the case at all; a page view
is a construct, rendered anew each time a server call is made. The server sends code to my machine that instructs the machine on what to render on my screen; and indeed sometimes multiple events
on that "page" can be beaconed, triggering multiple server calls behind what I experience as a single page. At comScore, we've learned from data reconciliations with clients' server data that
there typically isn't a one-to-one correspondence between the Internet user viewing a page, and the server data logging a page. Sometimes multiple beaconed events occur on a single page load;
and sometimes beacons fire for events that do not actually constitute valid page views at all according to the comScore Media Metrix definitions, but still get tallied in server data.
One leading Web analytics provider even told us that, strictly speaking, there really isn't any such a thing as a "page view" in Web analytics; they just don't have anything better to call
it.
But an intelligent application of server data and panel data together -- that may well be the holy grail of online metrics.
If you read about comScore's announcement introducing Media Metrix 360, you probably figured that this is
where I was going today. But I know that you, dear reader, don't want to hear a commercial from me in this space. Rather, I want to talk a little about the application of server data to a
panel-based audience measurement service, from the perspective of a media researcher.
All data are dumb on their own; we need organization, analysis and presentation n order to turn data
into knowledge. So while I may have seemed critical of site-centric data above, such is not the case at all. WebAnalytics providers like Omniture, Webtrends, and Coremetrics take this data
and use it to populate valuable reporting systems that allow you to track site performance and develop (and monitor) KPIs. I don't think these companies compete very much with one another on the
basis of the tagged or beaconed data they collect, or on the performance of their tags; rather, I suspect most of the play is in the tools, the analytics, the interface.
Similarly,
panel data in and of itself doesn't constitute audience measurement. These data must be collected, filtered, edited, and processed, with the end result populated into reporting tools for querying and
analysis. The things that go on between data capture and data delivery are essential components of audience measurement and reporting. For example, at comScore (I can't speak for our
competitors but would speculate that some have similar philosophies) we are, by definition, very particular in the crediting of traffic, because we are providing buyers and sellers with a consensus
view of Webentity traffic that might potentially carry ad messages to Internet users. So we edit out things like back end calls, redirects, etc. that often count as traffic in site-centric data,
but which are ineligible in our audience measurement reporting. This is part of the reason that ad agencies rely on services like Media Metrix; because we do, by design, endeavor to exclude
server calls that do not constitute legitimate user-initiated traffic. (I wrote more about this here.) For example, if I navigate to a URL that redirects me to the site's home page, the
advertiser wants that piece of navigation to register as one page view; yet often in site-centric data we see it registering as two page views (one for the redirecting URL and one for the home
page.) Advertisers rely on audience measurement services to filter out those extraneous server calls.
This is important to specify because, when Media Metrix 360 rolls out, the
panel-centric hybrid data will not suddenly mirror a Webentity's internal data. We will be using panel data combined with server data to create enhanced estimates of site visitation.
You'll see and hear more on this soon, but basically, once we've filtered out international and non-human traffic, we'll be able to take what we know about how people in-country interact with
Webentities (from the panel) and what we know about how machines in-country interact with the Webentity (from server data) and create measures of site visitation that are better than either a
panel-based or a server-based projection alone.
One final point. Often, I hear a publisher announce with wonder, "We put up third-party tags, and in days, their numbers
looked just like my server numbers!" This ought to surprise no one. While the implication may seem to be, "thus proving that that's what my audience is," in fact all this does is validate
that both sets of tags have been similarly implemented. It indicates nothing about Web audience -- certainly not in the sense that advertising buyers and sellers are concerned with audience
(which is to say, a person-based reach against content that might conceivably carry ads.). In our panel-centric hybrid technique, we too find that our beaconed data matches the site's internal
data; this is the first phase of validation we undergo. But that isn't where our processes end; it is where they begin. Once we've validated the implementation of our beacons, only then do we
begin the process of turning the data into ad agency-accepted audience measurement.
What we're hearing from many of our publisher clients is not that they want or expect audience
measurement data to mirror their internal data; rather, that they want to understand, to reconcile, the causes for deviation and provide the metrics that advertisers and their agencies need to plan
and design successful ad campaigns. They understand the pros and cons of both panel data and server data, and they appear to be quite energized right now about the prospect of truly integrating the
two.