When it comes down to measuring brand impact from a digital display campaign, there probably is no better tool than a survey-based ad effectiveness study. Dynamic Logic, Insight Express, comScore,
Nielsen are the names of a few that have had extensive experience in carrying out such task. For those who are new to the field, the general approach/setup starts with recruiting a sample of
respondents under a rigorously designed control/test environment.
The sample is split into two groups: control and exposed. By design, the exposed and the control are identical except for
their exposure to campaign creative, with the former exposed and the latter not exposed (a PSA is usually run in place of the real creative for control). As a result, the quantitative difference in
response to survey questions, usually speaking to brand awareness, message recall, and purchase intent, can be directly attributed to ad exposure.
This seemingly foolproof design has one
often-neglected flaw that may leave the reported overall ad effectiveness lift numbers inaccurate. The problem originates from a combination of the following two areas: the way the subjects are
recruited/surveyed and the way the brand metric lift numbers are subsequently tabulated. The issue with recruiting is that it happens through the course of the campaign. As we all know, in order to
gauge the full impact of the entire campaign, a survey needs to be served by the end of the campaign instead of in the middle. This is particularly important for our purpose because frequency of
exposure matters to ad effectiveness.
To use a hypothetical campaign, a respondent was surveyed at the time he/she was exposed to two impressions. On the aided awareness question, he/she
checked "not aware." After the survey was done -- but before the end of the campaign -- this respondent received an additional six exposures. For the sake of argument, if for this particular
respondent, it would take seven exposures to make him/her become aware of the brand, if he/she had been surveyed at the end of the eighth exposure (which would have happened after the campaign ran its
full course), he/she would have belonged to the "aware" category. So the OVERALL brand lifts reported from the survey may have understated the true performance of the campaign. In other words, the
campaign may have converted more people than what it says in the report. And we may have shortchanged ourselves.
There are two ways to correct this bias, one more realistic than the other.
Ideally, in order to get to the bottom of the problem, the study should be designed and implemented in such a way that questionnaires would only be served to respondents when ALL impressions from the
campaign are delivered to them. This of course is easy to say but hard to do.
The challenge is that before the end of the campaign there is no way for the researchers to know the impression
that is delivered to a particular respondent is the last one. Without such knowledge, the only choice left is to conduct the survey immediately after the campaign is over. However, by that time it
would be extremely hard (if not outright impossible) to get enough sample size for the study, particularly for control group.
Without audience segmentation schema implemented in adserving
system, for sure we will have most of the control group wiped out by that time. Even with audience segmentation implemented properly, the probability to get the enough respondents for us to read out
the difference statistically is still quite slim.
That leaves us with the second option, which is more doable and manageable in the sense that it does not require any modification to the
research design. I am proposing to use the data we are collecting anyway to correct the bias. The catch is to marry the frequency portion of the ad effectiveness results with adserver's frequency
distribution report for balancing.
As you all may have noticed, most ad effectiveness vendors would include media analysis in their studies. One important piece in that section of the
report is how brand metrics react to frequency of exposure. Lifts are calculated by frequency as long as sample size allows. Unlike the overall lift numbers, the lift numbers at each frequency bracket
do ACCURATELY reflect whether a respondent was moved by ad exposure at that frequency level. The impressions that are yet to be delivered to the respondent are not relevant to the frequency exercise.
This would allow us to utilize the numbers to minimize the potential bias at the overall level. What is needed in addition to lift numbers at frequency level is an adserver frequency distribution
report for the entire campaign.
The frequency report from adserver maps out the reach distribution at each frequency level (up to a point). And you do want to pull the report when the
entire campaign is over. The way to marry these two reports is first to calculate the effective reach for each frequency bracket. Effective reach can be derived from multiplying reach from each
frequency bracket by the lift number for the same bracket. What it measures is the total number of "converters" at each frequency level. To aggregate the number of "converters" up to an overall level
would get you the effective reach of the campaign for the exposed group. Subsequently, the conversion rates for brand metrics is calculated by dividing the total number of "brand converters" divided
by total reach. To use this brand conversion rate instead of the original one to calculate lift would balance out the potential bias called out earlier.
It is not the intention of this
article to lay blame on study providers, some of which are cognizant of the very issue and have been offering to balance the results for more customized studies. If there is anyone to blame, we, the
marketers/advertisers who have been commissioning such studies, should certainly not come out of this scot-free. Ultimately the studies are not the vendors' studies only, even though the vendors
design and implement them. The studies are as much ours as theirs. As good practice, the success of a study always requires full cooperation between the vendors and us throughout the entire process.
So what does this leave us on the very issue I have been addressing? First, study providers need to be more upfront about such potential biases. More important, they need to be explicit about
what they need in order to correct the bias. As I explained earlier, what they need may very well be in the hands of the marketers/advertisers. By not talking about it and pushing for adserver
reports, we are doing the study a disservice. In addition, if the normative database is used to benchmark the findings, it should be pointed out that the normative data collected so far have not been
adjusted against the bias. So there might be inaccuracy in the benchmarking process.
For marketers/advertisers, it is imperative that we are not just standing at the sideline after the
contract of the study is signed. To understand the nuances of the study -- including the tech/stats side -- is extremely important in getting the numbers right. If the media/creative team does not
have the expertise, please involve the analytical team in the process. When adserver data is requested (as in this case) to make the adjustment, we should provide the data in a timely manner unless we
feel confident to make such adjustments by ourselves. After all, we are the final users/beneficiaries of the findings and to not get the numbers right is shortchanging our own effort.
An
ad effectiveness study is one of the most important measurement staples for the digital advertising community. It actually is the only rigorous way to quantify how a digital campaign impacts brand.
There is no reason we should not conduct more of such studies, but it also makes it imperative that we get it right and better.
P.S.: Frequency bias is not the only problem the recruiting
method can lead to. Similar biases may also exist in site and creative distribution. The good news is that we may be able to use the same balancing technique to correct such biases.