Commentary

What's A Panel?

From the very dawn of the field of online metrics, as science developed in the latter part of the twentieth century, users and practitioners have debated the relative merits of site-centric versus user-centric measurement. Typically, when we talk about site-centric, we mean server data, the internal data tracked and compiled for publishers by Omniture, Webtrends, Coremetrics, Google Analytics, etc. User-centric data generally refers to panel data, in which some aggregate of individual Internet users are identified, and their behavior tracked, from the user or person-centric side. Site-centric data provides a census of machine activity occurring at a single website; panel-centric data provides a 360-degree look at activity exhibited by a sample of persons across websites.

In many of my contributions to this column, I've written about the efficacy of panel-centric measurement. Two weeks ago, I talked about the promise of panel-centric hybrids, writing specifically about comScore's Video Metrix service, which incorporates publisher beacon data into our Media Metrix panel measurement.

Today I want to take a step back and ponder a question that has been on my mind lately, as conversations about panel-centric, site-centric, and hybrid models intensifies: what, exactly, IS a panel?

I started out in the audience measurement business way before we had to worry about the Internet. Heck, when I started out we were just beginning to worry about cable TV.I worked at Arbitron, which at the time was the leader in local TV audience measurement, and I worked in the statistical services department on the local TV meter panels. So forgive me if I have a strict constructionist view of what constitutes a panel.

To me, in the context of audience measurement, a panel is a sample of persons, from whom behavioral data may be observed, and then projected to the population or universe at large over time. The comScore Media Metrix service is panel data. The Nielsen NTI (National Television Index) service is panel data.

I do not consider ISP data to be panel data. And I confess that sometimes I get a little hot under the collar when I see other companies claiming to do panel measurement, when in fact their data comes from an ISP. As researchers, I believe we have to be clear about the differences between persons tracking via representative panels, and machine tracking via the acquisition and parsing of machine-level ISP data. They are not the same thing.

There are several key differences between ISP data and panel data. One is that, quite simply, ISP data is a look at machine-level data, not person-level data. This is fine if the data is being used to report out on metrics at the machine level (and some companies using ISP data do indeed report at this level.)But for person-based projections, there is no way to disentangle this machine-level data into the individual persons using the machines (indeed, there is no way to identify who these persons are at all, let alone credit their Internet usage accordingly).Based on observations within our own panel, for example, we've found that over 40% of home machines have at least 2 users; and, that over a quarter have three or more users. In fact, fully two thirds of all Internet users use multiple-user computers. And indeed we've found that the demographic composition of a machine's users are quite heterogeneous; typically a two-person machine will have users from two different age cells, and both genders, represented. And, according to U.S. Census data, it is rare to find multiple people within the same household who are of the same age and gender

Of perhaps greater concern, though, is the fact that data from an ISP cannot be used to generalize about the universe at large, beyond the footprint of that ISP. Suppose, for example, I told you four years ago that I had complete census data for every voter in the state of New York, with respect to how they were going to vote in the 2004 presidential election.I'd know that precisely 2,962,567 voters would opt for George W. Bush, and that 4,314,280 voters would cast their ballots for John Kerry. From this census count of over 7 million voters, surely I could have made an inaccurate projection about the US voter market at large -- forecasting that Kerry was going to win the 2004 election.

The same phenomenon occurs with ISP data. Gian Fulgoni, in his post on the comScore blog, demonstrates that users across different ISPs vary widely in behavior; for example, something as basic as Google's share of search can vary widely by ISP. It's also apparent that the likelihood of visiting a certain portal, or the number of pages downloaded, or the number of ads received, varies widely across ISPs.(see this post from the Numbers Guy at the Wall Street Journal)

Like the census voter data from my home state of New York, an ISP might provide data on all the behavior from, say, 7 million computers -- and yet reflect only that ISP's footprint, which can deviate dramatically from the online universe at large.

Most of the ISPs in the U.S. do not sell their data to third-party information companies. As a result, large pockets of Internet usage are invariably excluded from an ISP pool even in the best of circumstances. In less than the best of circumstances, it is possible for ISP data (e.g. the United Online / NetZero ISP) to exhibit a distinct skew, by virtue of overrepresenting dial-up users, whose Internet usage we've demonstrated time and again is profoundly different from broadband users.

Now, I don't mean to suggest that there is no place for ISP data in online metrics.Quite the contrary; I think there is a place in a representative, panel-centric system for ISP data -- provided that this data is used in tandem with panel measurement, and only to represent that portion of the universe represented by the ISP's footprint.
Next story loading loading..