From a research perspective, digital set top box (STB) data is the most exciting thing to happen to the television research business since people meters rolled out in the late eighties. It has
the potential to have a giant impact on the industry" second by second data, new ad retention metrics, precise commercial and brand ratings, enormous sample sizes as well as the ability to directly
link tuning data with other information at a household or device level.
However, like any great new invention, there are some very large challenges that will need to be overcome to make STB
data useable as an analysis tool and potential currency for ad sales transactions. In the trade press, at industry meetings and conferences, you often only read or hear companies talking about
the benefits of STB research. There is not much discussion about the real issues that need to be overcome in order for the full potential of STB research to be realized. Following are some
items about STB data that you may or may not know:
advertisement
advertisement
STB Data is Not a Census
A census is similar to a survey, the difference being that a census collects
data from all members of a population while a survey is limited to a sample. In the case of STB data, data is only collected from sets that have the following criteria:
1. The set has to be connected to a digital STB.
2. The STB must have viewer measurement software installed.
3. The STB has to be connected to a return path; i.e., cable or phone line.
In addition to these requirements, the data need to be
successfully transmitted back to the distributor's plant. A lot of people you speak with seem to believe that STB data are perfect because the process of collecting and sending it happens as a
result of technology rather than human intervention. The truth is that the data have a lot of imperfections that need to be edited. With metered TV viewing data from a sample, there are
generally two sets of counts. One is called "installed" and the other "intab". Installed counts include all the homes installed with a meter. Intab counts are those homes that are
installed with a meter and that have sent usable data. Intab numbers are generally from 10 to 15 percent lower than installed counts. Intab numbers are the numbers used to calculate
ratings. Many people in the industry do not realize that a similar process needs to occur with STB data; once it is edited for gaps, discrepancies and impossible information the number of
useable homes can drop by a third or as much as half. The data that is left is neither census nor sample. In many cases, research companies are drawing probability samples from the
remaining homes. However, this defeats one of the primary reasons for using STB data in the first place-it was supposed to be a census.
Accurate STB
Impressions are Hard to Calculate
You would think that counting the number of viewer impressions to a network or program would be the easiest thing in the world
when using STB data. If you have all the homes included, you simply need to sum the homes that viewed. With a sample, you have to calculate the percentage of the sample that viewed
and multiply that by a universe estimate for a market or for a coverage area. This process is called projecting and the resulting numbers are called projections. What media companies refer
to as audience projections, advertisers call impressions.
For reasons discussed above, edited STB data are not a census and do not include all of the homes. If you count up the
homes that viewed the sum that is derived will be a deflated number due to the large amount of homes that were edited out. You could calculate the percentage of homes that viewed and multiply it
by the total unedited number of homes; however, this method falls apart when you take into account distributor down time.
When cable or satellite distributors have an outage the affected
homes have missing data. Gaps cause these homes to fail the editing rules and to be removed. Therefore, if you calculate the percentage of homes that tuned it will only include the ones
that were not subject to the outage. Moreover, when you produce a projection from that number it will be inflated because the homes that were impacted and that were consequently unable to view
would not be included in the measurement. When you consider that impressions are the currency for virtually all television ad sales transactions on the national level, this is a significant
obstacle. Outages happen fairly frequently.
Estimating Error in STB Research is Difficult
One of the great things about using census
data is that there is no sampling error. Unfortunately, as I pointed out earlier, the STB data are not a census. One of the advantages of using a probability sample is that there are
proven statistical techniques that enable researchers to estimate the amount of sampling error. As a result, if a research company creates a probability sample using STB data there are
statistical procedures that can be used to calculate sampling error. One thing to keep in mind with STB data, there is an issue related to the homes that failed the editing rules and whether or
not they are biased towards a certain type. More research needs to be done in this area to understand if this is the case.
It is important to note that sampling error is not the
only type of error inherent in research. With a census there is administrative error related to working with such a large number of units of analysis. There is response and non-response
error as well as error that may be introduced as a result of modeling. This latter type of error is important to note as research companies are developing ways to model the STB data for a
variety of defects and holes.
These include situations where the set top box is left "on" while the TV set is turned "off." In these instances, the STBs log activity when the TV is
off and the home is not tuning. Research companies are additionally looking at ways to model the STB data for information that is not included; specifically, for demographics, household
characteristics and for TV sets not attached to a digital STB. Some companies are looking to fuse the STB homes or devices to homes or devices in another research sample.
This
process enables the qualities that are missing from the STB data to be taken from the sample. In doing this, not only are there error associated with the STB data, you additionally have error
that is introduced as a result of the modeling, not to mention the sampling error inherent in the non-STB sample.
To my knowledge, there are no proven statistical procedures to estimate
ranges of error for research with so many levels of complexity. Certainly, statisticians will be able to come up with a creative solution. However, it will take some time for these new
procedures to gain the trust and industry acceptance attained by the simple calculations that have been tested over and over again by social researchers for estimating sampling error using a plain old
random probability sample.
While none of these issues are insurmountable, they are worthy of discussion and should be part of the dialogue. The industry needs to be careful about
the hyperbole and recognize that it is in a learning phase. This data requires thoughtful and open consideration before it is used for business purposes. Decisions need to be made about
what procedures and methods will render the most accurate and reliable research. Simply talking about the benefits of STB data are not going to breed the success that the industry craves.