Set Top Box Research: A Call for More Open Dialogue
However, like any great new invention, there are some very large challenges that will need to be overcome to make STB data useable as an analysis tool and potential currency for ad sales transactions. In the trade press, at industry meetings and conferences, you often only read or hear companies talking about the benefits of STB research. There is not much discussion about the real issues that need to be overcome in order for the full potential of STB research to be realized. Following are some items about STB data that you may or may not know:
STB Data is Not a Census
A census is similar to a survey, the difference being that a census collects data from all members of a population while a survey is limited to a sample. In the case of STB data, data is only collected from sets that have the following criteria:
1. The set has to be connected to a digital STB.
2. The STB must have viewer measurement software installed.
3. The STB has to be connected to a return path; i.e., cable or phone line.
In addition to these requirements, the data need to be successfully transmitted back to the distributor's plant. A lot of people you speak with seem to believe that STB data are perfect because the process of collecting and sending it happens as a result of technology rather than human intervention. The truth is that the data have a lot of imperfections that need to be edited. With metered TV viewing data from a sample, there are generally two sets of counts. One is called "installed" and the other "intab". Installed counts include all the homes installed with a meter. Intab counts are those homes that are installed with a meter and that have sent usable data. Intab numbers are generally from 10 to 15 percent lower than installed counts. Intab numbers are the numbers used to calculate ratings. Many people in the industry do not realize that a similar process needs to occur with STB data; once it is edited for gaps, discrepancies and impossible information the number of useable homes can drop by a third or as much as half. The data that is left is neither census nor sample. In many cases, research companies are drawing probability samples from the remaining homes. However, this defeats one of the primary reasons for using STB data in the first place-it was supposed to be a census.
Accurate STB Impressions are Hard to Calculate
You would think that counting the number of viewer impressions to a network or program would be the easiest thing in the world when using STB data. If you have all the homes included, you simply need to sum the homes that viewed. With a sample, you have to calculate the percentage of the sample that viewed and multiply that by a universe estimate for a market or for a coverage area. This process is called projecting and the resulting numbers are called projections. What media companies refer to as audience projections, advertisers call impressions.
For reasons discussed above, edited STB data are not a census and do not include all of the homes. If you count up the homes that viewed the sum that is derived will be a deflated number due to the large amount of homes that were edited out. You could calculate the percentage of homes that viewed and multiply it by the total unedited number of homes; however, this method falls apart when you take into account distributor down time.
When cable or satellite distributors have an outage the affected homes have missing data. Gaps cause these homes to fail the editing rules and to be removed. Therefore, if you calculate the percentage of homes that tuned it will only include the ones that were not subject to the outage. Moreover, when you produce a projection from that number it will be inflated because the homes that were impacted and that were consequently unable to view would not be included in the measurement. When you consider that impressions are the currency for virtually all television ad sales transactions on the national level, this is a significant obstacle. Outages happen fairly frequently.
Estimating Error in STB Research is Difficult
One of the great things about using census data is that there is no sampling error. Unfortunately, as I pointed out earlier, the STB data are not a census. One of the advantages of using a probability sample is that there are proven statistical techniques that enable researchers to estimate the amount of sampling error. As a result, if a research company creates a probability sample using STB data there are statistical procedures that can be used to calculate sampling error. One thing to keep in mind with STB data, there is an issue related to the homes that failed the editing rules and whether or not they are biased towards a certain type. More research needs to be done in this area to understand if this is the case.
It is important to note that sampling error is not the only type of error inherent in research. With a census there is administrative error related to working with such a large number of units of analysis. There is response and non-response error as well as error that may be introduced as a result of modeling. This latter type of error is important to note as research companies are developing ways to model the STB data for a variety of defects and holes.
These include situations where the set top box is left "on" while the TV set is turned "off." In these instances, the STBs log activity when the TV is off and the home is not tuning. Research companies are additionally looking at ways to model the STB data for information that is not included; specifically, for demographics, household characteristics and for TV sets not attached to a digital STB. Some companies are looking to fuse the STB homes or devices to homes or devices in another research sample.
This process enables the qualities that are missing from the STB data to be taken from the sample. In doing this, not only are there error associated with the STB data, you additionally have error that is introduced as a result of the modeling, not to mention the sampling error inherent in the non-STB sample.
To my knowledge, there are no proven statistical procedures to estimate ranges of error for research with so many levels of complexity. Certainly, statisticians will be able to come up with a creative solution. However, it will take some time for these new procedures to gain the trust and industry acceptance attained by the simple calculations that have been tested over and over again by social researchers for estimating sampling error using a plain old random probability sample.
While none of these issues are insurmountable, they are worthy of discussion and should be part of the dialogue. The industry needs to be careful about the hyperbole and recognize that it is in a learning phase. This data requires thoughtful and open consideration before it is used for business purposes. Decisions need to be made about what procedures and methods will render the most accurate and reliable research. Simply talking about the benefits of STB data are not going to breed the success that the industry craves.