With the recent New York Times article stating that Statisticians were the "next sexy job", although I felt vindicated, I wondered if misguided reporters had taken the concept of "modeling" too literally. But then I thought of how often statistics comes up in conversations, yet how little even we in the analytics-driven BT world understand about its potential impact. As a statistician, a question people often ask me around election time is why campaign projections are always given with a sampling error of plus or minus a few percentage points -- particularly if the margin of error is greater than the margin between the two candidates.
Although we can predict election (or campaign) results based on samplings of voters (or users), there are rules by which we statisticians must sample the data in order for us to feel certain regarding our predictions. And sometimes, as has happened in a number of elections, we uncover polling errors, like under-sampling younger voters who don't have land telephone lines, or mis-sampling certain demographic or ethnic groups, which can lead to election prediction errors.
As a statistician working in online advertising, here too we have rules that we must apply in order to predict campaign performance so that our sales and account teams can most effectively target campaigns to client needs.
The statistician in me would love to test every campaign against every segment for months in order to predict with nearly 100% certainty the outcome of each campaign. But our account teams need to be able to target campaigns for clients quickly. And as I noted above, sometimes small samples perform in a certain way, yet when one runs a larger sample, the results change because of an erroneous sampling assumption. So, how can I satisfy the statistician in me while keeping my account team and our clients happy by predicting consistent campaign performance? The answer lies in statistical modeling and a theory known as the Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution). Published by French mathematician Siméon-Denis Poisson in 1838, the Poisson distribution aims to predict the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently of the time since the last event. The work focused on certain random variables that count, among other things, the number of discrete occurrences (sometimes called "arrivals") that take place during a time interval of given length (source: Wikipedia).
What does this mean in online advertising terms? Let's say I'm trying to figure out how a specific demographic group will respond to a certain advertising message based on the limited data I have. Well, I can build a statistical model based on the Poisson distribution assumption that allows us to sample users again and again from the initial campaign performance data, knowing their expectation and variance of arrivals. That will enable me to predict with a high degree of certainty how this and other groups will respond, which can enable our account teams to better target campaigns for their advertisers.
So if in my test period, 23 users from a certain targeted group respond to my client's advertising message, my statistical model runs scenarios that enable me to predict with a high degree of certainty how that same group of users will perform. And as history has taught us, statistical modeling using the Poisson distribution is more reliable in predicting which targeted groups will provide the best campaign lift than merely relying on the initial campaign performance data.
In behavioral targeting, the greatest analytical challenge we face is figuring out how to achieve a campaign lift quickly yet consistently, and with satisfying volume based on the initial campaign data. Statistical modeling and the Poisson distribution theory enable doing more and doing it better with less data. So, the real beauty of good targeting is not just in the data -- but how you model it. Maybe those sexy statistician comments weren't too far off!
America's Next Top Model?