Commentary

Validating Attribution Models

by Peter Norwood , May 31, 2012

Attribution models have emerged as a powerful tool for helping advertisers understand which parts of their marketing efforts are driving sales. An attribution model works by assigning partial credit to each advertising event that influenced a user to convert, and can generally be separated into simple models and advanced models. Simple attribution models use predetermined weights to assign credit to each ad, while advanced attribution models use a more scientific approach.

All simple attribution models have rules for assigning credit to each touch point on the path to conversion. Last-event attribution assigns all the credit to the last ad, while even attribution spreads the credit out evenly. Other attribution models assign each event an arbitrary credit depending on its position in the sequence of events leading to a conversion. Typically, the first touchpoint to reach a user is called the “introducer,” the last touchpoint is called the “closer,” and every touchpoint in between is called a “promoter.” Introducers and closers will get outsized credit, while promoters divide up the rest of the credit.

Advanced attribution is substantially different from these simple attribution methods. Advanced attribution models are “data-driven.”.A data-driven model lets the bottom-up data determine the importance of each ad in a sequence. Data-driven models typically look at the entire data set of converting and non-converting data and they look at every sequence that leads to conversions to determine how much credit to give to each ad.

Simple attribution models will give some quick back-of-the-envelope answers that provide some general insights, but serious marketers rely on the more scientific advanced attribution models for an objective understanding of the actual performance of each part of their advertising campaigns.

But how does one know that one attribution model is superior to another one? How can we know if an advanced attribution model is superior to a simple attribution model? And how can we know how accurate a model is? One way to figure it out is with model validation.

There are several techniques to validate an attribution model. One of the most effective is lift analysis. A lift analysis compares the conversion rates of two groups of users that have been carefully chosen such that they are similar in all respects except one.

As an example, for the first group, separate out users who have never seen a display ad prior to clicking on paid search ads and then converting. Next, let’s make the second group users who have seen display ads prior to clicking on paid search ads and then converted. Now, we can compare the different conversion rates of the two groups.

The difference in the conversion rates between group one and group two will tell us quantitatively at an aggregate level how much the top-of-funnel display activity has lifted the conversion rate above the baseline of the paid search users’ conversion rate. The lift in conversion rate experienced by the second group should be credited to the aggregate display events seen by those users.

For example, if the conversion rate is doubled when users see assisting display impressions, then 50% of the credit for the conversions should go to the display impressions and the other 50% should go to the paid search ads clicked on by the display assisted paid search user group.

While seemingly simple, this technique is extremely valuable because it solves a complex problem. It allows us to validate an attribution model by comparing it with a ground truth that is established without relying on the attribution model itself. Hence, we can measure the efficacy and accuracy of the attribution model by using a top-down approach to validate our bottom-up results.

9 comments about "Validating Attribution Models".

Check to receive email when comments are posted.

michael Kaushansky from Havas Helia, May 31, 2012 at 4:56 p.m.
Great article...Though the lift analysis on search doesn't always work. Since your Ads are likely targeted; thus the users exposed to Ads are your "target audience" vs non-exposed which is not.
Your "target audience" is MUCH more likely to convert using search despite being exposed to an Ad. With that bias its difficult to tease out Ad exposure lift.
Reply

Paula Lynn from Who Else Unlimited, May 31, 2012 at 5:25 p.m.

1 + 1 = 3. Not all attribution are separate and live alone.

John Grono from GAP Research, May 31, 2012 at 7:06 p.m.

... assuming that the ONLY marketings impacts are online whether the conversion is offline or online.

Shi Zhong from Adometry Inc., June 1, 2012 at 11:13 a.m.

I should add that at Adometry we do recognize and correct for user selection bias in the lift analysis mentioned in this article.

Vincent Granville from Analyticbridge, June 1, 2012 at 12:53 p.m.

Attribution models are not necessarily complicated. They take into account print, TV, radio, online advertising, word of mouth, organic traffic. They use various weights attached to historical data and different channels. They are mostly based on multivariate time series, and optimized using good cross-validation / model fitting techniques, as well as sensitivity analysis to discard great but non robust models. More on this to be published in my free eBook "Data Science by Analyticbridge".

Peter Norwood from Adometry, Inc., June 1, 2012 at 2:16 p.m.

Michael has made a very good point. The article was too short to go into detail on the distinction between a simple lift analysis and a "bias-corrected" lift analysis. Whenever doing a lift analysis it is important to correct for bias between the selected groups. The proper methodology for doing a "bias-corrected" lift analysis is to ensure the two user groups are selected from the same target audience.

Huayin Wang from Accuen Media, June 1, 2012 at 3:59 p.m.

Peter, I disagree with quite a few things you said - it pains me to do this given how much I like Adometry! I think you confused about attribution model with attribution modeling - calling attribution model "simple" and attribution modeling "advanced" does not do justice to them, as they are quite different things. Regarding to your lift analysis, the more serious problems, aside from the bias issue, are these: it credits 100% of the (search and display) interaction effect to display alone; it also does not account for another impact of display: driving up more search traffic.

My last point goes to Vincent's comment: Marketing/Media Mix Modeling and Attribution Modeling are very different in so many areas that they are really not interchangeable - for one thing, MMM is built on aggregated data containing nothing about individual conversion paths whereas attribution is all about the most granular user touch point event data (the sequencing patterns - the macro and micro distinction as usually said.

Peter Norwood from Adometry, Inc., June 4, 2012 at 5:36 p.m.

Thanks Huayin for your comments. You made three points in your comment. I don’t think I can answer the first one on the difference between an attribution model and attribution modeling. I’d appreciate it if you could elaborate so I can get a better understanding of your comment. With respect to the second point about bias in the lift analysis, you make a really good point. It is possible that even after controlling for bias by carefully selecting the two groups from the target audience, it is still possible to have other effects that deserve some credit beyond the display activity. Your example of display lifting search clicks is a good example of that. The point of the article is not to say that lift analysis is 100% accurate but to say that it is a good technique for validating the model. Attribution model validation is a difficult problem to solve because we need an independent method that does use an explicit or implied attribution model to arrive at the credit deserved by events. The method we propose fits the bill but needs to be done cautiously with intelligent consideration for bias and should not be expected to be 100% accurate. Building a simulation model that treats validation as a toy problem is another approach that is valid but I didn’t discuss that in this article. Finally, I agree with your third point that MMM is different from event level attribution. MMM in many ways is complimentary to event level attribution and the two can share insights, which might be the topic of a subsequent article.

Huayin Wang from Accuen Media, June 4, 2012 at 6:02 p.m.

Peter, thanks for your response. To me, Attribution Model refers to any rules or formula that we can use to distribute or partition the total conversion credit. It is not difficult to come up with attribution models - it is difficulty to convince ourselves and others that we found the right one. Attribution Modeling refers to data-driven statistical modeling process - process we can use to derive the attribution rules, or formula, so that we do not have to justify them above and beyond justifying the modeling methodology. I have written some related topics in my blog, in case you'd like to know more about my thinking on this: http://huayin.wordpress.com/2012/05/17/attribution-model-and-attribution-modeling/