When someone says, “We have data,” your BS detector should go on high alert. Any data, or any fact, is only as good as the truth it may contain. In marketing, the consequences for inaccurate data can run from bad campaign to kill the company.
Why go there? Because the advertising industry is trying to figure out how to measure data quality. For example, the Interactive Advertising Bureau Tech Lab is devoting considerable resources to the quality question in conjunction with its data transparency standard.
Advertisers should pay attention. One key aspect of this discussion is the role of accuracy.
In practice, data accuracy affects all phases of media, from planning to attribution.
To illustrate, here are three examples of data in use. For comparison, all examples use the same segment: toy buyers. That is, a list of people who buy toys. The segment was measured by panel overlap to reveal that 40% of the people in the segment want to buy toys. So, it’s 40% accurate.
Case 1: Targeting. When used to control programmatic buying, 40% accurate means, nominally, 40% of the impressions will go to actual toy buyers, but 60% won’t. Is that good?
One way to gauge that is to compare the concentration in the segment to the concentration in the U.S. market. Our segment measures 35% better than the portion of toy buyers in the U.S. market. Whether that’s a good deal depends on what you paid for the data, and the incidence of toy buyers in competing media choices.
Calibrating to other media however, 35% lift over random is excellent. On TV, for example, a target of females 18-34 typically results in 23% delivery on target, a lift of just a few points over general market incidence.
Case 2: Planning. Media planners often try to find prospects' adjacent interests in order to place ads at sites prospects are likely to visit. The conventional wisdom is, “Fish where the fish are.”
But just where are the fish? To find out, a planner might compare the toy buyer segment to other segments. Do those toy buyers belong to other segments? The planner might notice that many toy buyers' IDs also occur in a segment called “urban dwellers.” The urban dwellers segment, for the sake of argument, is 30% dense with actual urban dwellers. You might conclude that urban dwellers buy a lot of toys. Knowing this, we can use all sorts of media in and around cities.
But wait. Only 40% of the people in the toy segment were actually toy buyers in the first place. And only 30% of the people in the “urban” segment actually live in a big city. So, even though the overlap might have been high, the probability that an overlapping identity is actually both a toy buyer and urban dweller is 40% of 30%, or 12%!!! Yikes.
Here, the conclusion that toy buyers skew urban is incorrect because the planner did not know the accuracy of the data being used. The result could be huge waste, attribution failure, or misleading insights. Basically, a disaster.
Case 3: Reach extension. In a nutshell, reach extension is commonplace because first-party data has the IDs of people you know, but is low reach. Fix? Extend the segment with people who seem like they should be in that audience based on shared characteristics. Welcome to “probabilistic” segments.
Let’s say our toy buyers skew toward wealth and education. We can cleverly fluff out the toy-buyers segment with the IDs of people who are educated and wealthy. Not bad, right?
Wrong. First, a skew toward wealth and education for a category does not mean that wealth and education predict intention in that category. Not even close. Scuba divers are wealthy and educated, but only a tiny fraction of wealthy educated people scuba dive.
A common fallacy (called ecological inference fallacy) occurs when inferences about the nature of individuals are deduced from inferences about a group to which those individuals belong.
Second, the data from which the skew was determined might have been wrong. Third, the original segment was only 40% toy buyers in the first place. The inaccuracies compound, resulting in a big fat mess.
That doesn’t mean that all reach extension is bad, and certainly there are more sophisticated methods of modeling reach. It does suggest that measuring accuracy at all phases of the model development can save the day.
There you have it. One case where 40% accuracy is an ROI bonanza for advertisers, one where 40% results in a planning disaster, and another where 40% results in a “meh” for reach extension.
The use cases are different — but in all cases, knowing the accuracy is the difference between success and failure.