Commentary

Are We Shortchanging Ourselves In Ad Effectiveness Studies?

When it comes down to measuring brand impact from a digital display campaign, there probably is no better tool than a survey-based ad effectiveness study. Dynamic Logic, Insight Express, comScore, Nielsen are the names of a few that have had extensive experience in carrying out such task. For those who are new to the field, the general approach/setup starts with recruiting a sample of respondents under a rigorously designed control/test environment.

The sample is split into two groups: control and exposed. By design, the exposed and the control are identical except for their exposure to campaign creative, with the former exposed and the latter not exposed (a PSA is usually run in place of the real creative for control). As a result, the quantitative difference in response to survey questions, usually speaking to brand awareness, message recall, and purchase intent, can be directly attributed to ad exposure.

This seemingly foolproof design has one often-neglected flaw that may leave the reported overall ad effectiveness lift numbers inaccurate. The problem originates from a combination of the following two areas: the way the subjects are recruited/surveyed and the way the brand metric lift numbers are subsequently tabulated. The issue with recruiting is that it happens through the course of the campaign. As we all know, in order to gauge the full impact of the entire campaign, a survey needs to be served by the end of the campaign instead of in the middle. This is particularly important for our purpose because frequency of exposure matters to ad effectiveness.

To use a hypothetical campaign, a respondent was surveyed at the time he/she was exposed to two impressions. On the aided awareness question, he/she checked "not aware." After the survey was done -- but before the end of the campaign -- this respondent received an additional six exposures. For the sake of argument, if for this particular respondent, it would take seven exposures to make him/her become aware of the brand, if he/she had been surveyed at the end of the eighth exposure (which would have happened after the campaign ran its full course), he/she would have belonged to the "aware" category. So the OVERALL brand lifts reported from the survey may have understated the true performance of the campaign. In other words, the campaign may have converted more people than what it says in the report. And we may have shortchanged ourselves.

There are two ways to correct this bias, one more realistic than the other. Ideally, in order to get to the bottom of the problem, the study should be designed and implemented in such a way that questionnaires would only be served to respondents when ALL impressions from the campaign are delivered to them. This of course is easy to say but hard to do.

The challenge is that before the end of the campaign there is no way for the researchers to know the impression that is delivered to a particular respondent is the last one. Without such knowledge, the only choice left is to conduct the survey immediately after the campaign is over. However, by that time it would be extremely hard (if not outright impossible) to get enough sample size for the study, particularly for control group.

Without audience segmentation schema implemented in adserving system, for sure we will have most of the control group wiped out by that time. Even with audience segmentation implemented properly, the probability to get the enough respondents for us to read out the difference statistically is still quite slim.

That leaves us with the second option, which is more doable and manageable in the sense that it does not require any modification to the research design. I am proposing to use the data we are collecting anyway to correct the bias. The catch is to marry the frequency portion of the ad effectiveness results with adserver's frequency distribution report for balancing.

As you all may have noticed, most ad effectiveness vendors would include media analysis in their studies. One important piece in that section of the report is how brand metrics react to frequency of exposure. Lifts are calculated by frequency as long as sample size allows. Unlike the overall lift numbers, the lift numbers at each frequency bracket do ACCURATELY reflect whether a respondent was moved by ad exposure at that frequency level. The impressions that are yet to be delivered to the respondent are not relevant to the frequency exercise. This would allow us to utilize the numbers to minimize the potential bias at the overall level. What is needed in addition to lift numbers at frequency level is an adserver frequency distribution report for the entire campaign.

The frequency report from adserver maps out the reach distribution at each frequency level (up to a point). And you do want to pull the report when the entire campaign is over. The way to marry these two reports is first to calculate the effective reach for each frequency bracket. Effective reach can be derived from multiplying reach from each frequency bracket by the lift number for the same bracket. What it measures is the total number of "converters" at each frequency level. To aggregate the number of "converters" up to an overall level would get you the effective reach of the campaign for the exposed group. Subsequently, the conversion rates for brand metrics is calculated by dividing the total number of "brand converters" divided by total reach. To use this brand conversion rate instead of the original one to calculate lift would balance out the potential bias called out earlier.

It is not the intention of this article to lay blame on study providers, some of which are cognizant of the very issue and have been offering to balance the results for more customized studies. If there is anyone to blame, we, the marketers/advertisers who have been commissioning such studies, should certainly not come out of this scot-free. Ultimately the studies are not the vendors' studies only, even though the vendors design and implement them. The studies are as much ours as theirs. As good practice, the success of a study always requires full cooperation between the vendors and us throughout the entire process.

So what does this leave us on the very issue I have been addressing? First, study providers need to be more upfront about such potential biases. More important, they need to be explicit about what they need in order to correct the bias. As I explained earlier, what they need may very well be in the hands of the marketers/advertisers. By not talking about it and pushing for adserver reports, we are doing the study a disservice. In addition, if the normative database is used to benchmark the findings, it should be pointed out that the normative data collected so far have not been adjusted against the bias. So there might be inaccuracy in the benchmarking process.

For marketers/advertisers, it is imperative that we are not just standing at the sideline after the contract of the study is signed. To understand the nuances of the study -- including the tech/stats side -- is extremely important in getting the numbers right. If the media/creative team does not have the expertise, please involve the analytical team in the process. When adserver data is requested (as in this case) to make the adjustment, we should provide the data in a timely manner unless we feel confident to make such adjustments by ourselves. After all, we are the final users/beneficiaries of the findings and to not get the numbers right is shortchanging our own effort.

An ad effectiveness study is one of the most important measurement staples for the digital advertising community. It actually is the only rigorous way to quantify how a digital campaign impacts brand. There is no reason we should not conduct more of such studies, but it also makes it imperative that we get it right and better.

P.S.: Frequency bias is not the only problem the recruiting method can lead to. Similar biases may also exist in site and creative distribution. The good news is that we may be able to use the same balancing technique to correct such biases.

9 comments about "Are We Shortchanging Ourselves In Ad Effectiveness Studies?".
Check to receive email when comments are posted.
  1. Jarvis Mak from Rocket Fuel, May 8, 2009 at 5:54 p.m.

    Chen,

    You've hit the nail on the head on the problem with frequency.

    Another issue that I've seen is these studies is when they try to recruit the control group. They don't send survey invitations to every person so as you get later in the campaign, it's very hard to find someone who would qualify to be in your control group, especially if you have a sizable campaign. As a result, to collect enough control sample, publishers recruit the bulk of the control group at the start of the campaign.

    One of the basic premises for ad effectiveness studies are that test and control groups have equal opportunity to be exposed to (and be impacted by) all other media. The problem is the following scenario. Imagine that you have a holiday media blitz where TV is heavied up during the first two weeks of the online campaign. Then imagine that the bulk of your control group was recruited during the first two weeks with a lot of TV going on while the test group sees online campaigns but very little TV is also running. Then your test and control groups are NOT the same. The test group will have far less TV exposure when they take the survey and the brand is far less likely to be top-of-mind for them. Overall, your results will look terrible.

    Or imagine the reverse. The TV (or other media) is heavied up in the middle to end of the online campaign. So the control group has been exposed to almost no outside media and the test group (recruited throughout the campaign) gets a lot. Then your results are exaggerated.

    Jarvis Mak

  2. John Grono from GAP Research, May 8, 2009 at 6:57 p.m.

    A great thought provoking post Chen.

    However, doesn't this issue affect ALL effectiveness studies - offline and online? At the risk of sounding an heretic, shouldn't they ALL be on the same basis to allow comparability as that is surely what the marketer is after - which worked better? Of course all the studies should produce the most accurate result but if that is not feasible then it is probably advantageous that they are all equally wrong!

    Eric I do like the Direct Traffic measure as a key indicator.

    Jake, agree 100% on cookie deletion destroying the relationship between reach and frequency, but I sure hope you are also taking into account the over-inflation of gross impressions due to multiple browsers, multiple tabs etc etc.

    Jarvis, again 100% correct. The ideal way of course is to run the study during "quiet time" but with established brands that is often simply not possible. Geo-targeting the campaign and testing is needed. Failing that, multivariate modelling tends to tease out the relationship with the one caveat - that awareness is a fairly blunt metric poorly reported by the respondent. A few years ago I had a test where the clients TV ad awareness rose pleasingly rapidly. The only issue was that they hadn't been on TV for four months and the spike was caused by a burst of outdoor advertising on bus shelters!

  3. Peter Rosenwald from Consult Partners, May 9, 2009 at 11:48 a.m.

    The metrics of brand recognition are, as the author argues, extremely complicated and subject to bias, unintended or other.

    It could be argued, although it seldom is, that the key metric must be what direct marketers would label the "Allowable Cost Per Order" (ACPO)- how much the marketer can afford to spend to accomplish a defined task. Whether it takes five or six or however many impressions to gain the desired brand or product recognition, is only important if the sum of the costs of these impressions is less than the ACPO.

    One key driver of the ACPO is the lifetime value of the customer. Imagine the lifetime value of a Pampers buyer, at best two years of total loyalty starting with birth.

    How much (and it is a large number for this product category, much less for single unit sales) can the marketer afford to spend to accomplish this objective.

    It might be worthwhile to ponder and perhaps use this metric. My book, Accountable MArketing, might help.

    Peter Rosenwald

  4. Gian Fulgoni from 4490 Ventures, May 11, 2009 at 7:40 a.m.

    Very interesting post, Chen. I would point out (as has Jake from TNS) that, because of cookie deletion, ad server data overstate reach and understate frequency by substantial amounts. So, I don't think that's the solution. What does work is to track households over the course of the campaign using continuously-tracked panels and then at the conclusion separate the panelists into "exposed" vs "non-exposed" HHs. That's one approach we use at comScore. With it, one can then compare the behavioral response of the exposed vs non-exposed HHs in terms of site visitation, trademark search and both online and offline sales. This reveals the lift caused by the campaign. We've written a white paper summarizing the results of our research. You can obtain a copy here:
    http://www.comscore.com/Press_Events/Press_Releases/2008/11/Value_of_Online_Advertising

    I presented the results at Prof. Jerry Wind's "Empirical Generalizations in Advertising" conference at Wharton late last year and they are being published in the upcoming issue of The Journal of Advertising Research.

    Gian Fulgoni

  5. Chen Wang from Ninah Consulting, May 11, 2009 at 10:09 a.m.

    Since a few of you have mentioned cookie deletion in your responce, I think I would want to elaborate a little regarding this issue. Technically cookie deletion should NOT be a problem in measurement as long as adservers are doing their job correctly. In order to correct cookie deletion bias, adservers need to tabulate numbers based ONLY upon cookies that have birthdates that are reasonably early (e.g. 1 year back). Subsequently they can project the numbers to the population using certain techniques. I believe there are adservers out there doing this for certain reports (frequency distribution is usually one of those). It is also true that not all adservers are using this technique, which I believe should be the best practice in the industry. As long as such correction is used, the frequency distribution report should be immune from cookie deletion bias.

    Chen

  6. Gian Fulgoni from 4490 Ventures, May 11, 2009 at 11:26 a.m.

    Chen: I think the problem with ad servers and cookie deletion is that they use persistent cookies to compute R/F for the campaign but NOT to deliver the ads. This means that they are delivering more frequency than the plan calls for (because they interpret a cookie-free machine as a "new" ad impression).

    Gian

  7. Michael Saxon from Symphony Advanced Media, May 11, 2009 at 9:49 p.m.

    Chen,

    I think what we're all getting at is the desire to have response curves - understanding not just what the average exposure frequency was for a particular campaign, but how brand response changes as exposure grows. To answer this question, you must know, for every survey respondent, how many times he/she has seen the campaign.

    Here is where cookie deletion plays in again. For pop-up based survey providers, cookie deletion can make someone who has seen a campaign multiple times "look" like a control respondent, or at very least undercount the frequency of exposure. Panel-based providers, such as Comscore and my company, TNS, can control for cookie deletion, and provide true response curves.

    We're finding, for example, that the "right" number of exposures for a campaign depends on the campaign goal - brand awareness growth may top out after 3-4 exposures, but imagery measures may continue to grow even after 8-10 exposures.

    The next hot topic, of course, is that online advertising doesn't occur in a vacuum...so now we need to know whether a survey respondent has also been exposed to offline advertising, and whether the two are synergistic. To answer those questions, you're clearly back in the realm of panels...

    Mike Saxon

  8. Lee Smith from Persuasive Brands, May 26, 2009 at 7:40 p.m.

    Chen,

    You raise valid points about exposure, site, and creative distributions (and there are other dimensions such as demographic, message, etc.) that often necessitate "adjustments" to captured data to accurately reflect the overall performance of an online campaign.

    However, you may have overlooked two of the most significant known factors impacting the measurement accuracy of a campaign's branding performance: non-cooperation and non-completion bias.

    As everyone knows (but quickly forgets) only a tiny fraction of a percentage of those who are invited to share their opinions in ad effectiveness studies actually participate -- and even fewer complete lengthy surveys. And with such low rates driving self selection, it's obvious that the captured data cannot reflect the universe of those who were exposed to the campaign (and this is difficult to correct particularly as it's hard to know who did not participate).

    As in nearly all forms of market research, the impact of non-cooperation and non-completion bias is significant and simply dwarfs those highlighted in the article. While I agree with the general focus of the article, encouraging marketers to concentrate upon exposure distributions is a bit akin to re-arranging deck chairs on a large luxury liner in the north Atlantic; larger issues need to be addressed to more accurately measure online branding campaigns.

  9. Brian Mcginty from Razorfish, May 28, 2009 at 5:03 p.m.

    Chen,

    Excellent post! We have always taken into account frequency distribution when analyzing lift numbers. We then use that to set frequency caps in order to maximize the efficiency and effectiveness of our campaigns. Your addition of an effective reach calculation is a great metric and is definitely a more accurate way of calculating true lift.

    I also agree with you that the general marketer must become more proficient in the quantitative aspects of advertising effectiveness measurement. While I have seen many marketers shift their perspective from measurement as a nice-to-have to a necessity, a mere requisition for measurement is not enough. These days, if you can't accurately measure AND articulate the value of your efforts, you're out! Certainly marketers would bode well from a greater comprehension of the underpinnings of the studies they are investing in. And we as their agents should be mindful to educate where we can.

Next story loading loading..