With the recent New York Times article stating that Statisticians were the "next sexy job", although I felt vindicated, I wondered if misguided reporters had taken the concept of "modeling"
too literally. But then I thought of how often statistics comes up in conversations, yet how little even we in the analytics-driven BT world understand about its potential impact. As a statistician, a
question people often ask me around election time is why campaign projections are always given with a sampling error of plus or minus a few percentage points -- particularly if the margin of error is
greater than the margin between the two candidates.
Although we can predict election (or campaign) results based on samplings of voters (or users), there are rules by which we statisticians must
sample the data in order for us to feel certain regarding our predictions. And sometimes, as has happened in a number of elections, we uncover polling errors, like under-sampling younger voters who
don't have land telephone lines, or mis-sampling certain demographic or ethnic groups, which can lead to election prediction errors.
As a statistician working in online advertising, here too we
have rules that we must apply in order to predict campaign performance so that our sales and account teams can most effectively target campaigns to client needs.
The statistician in me would
love to test every campaign against every segment for months in order to predict with nearly 100% certainty the outcome of each campaign. But our account teams need to be able to target campaigns for
clients quickly. And as I noted above, sometimes small samples perform in a certain way, yet when one runs a larger sample, the results change because of an erroneous sampling assumption. So, how can
I satisfy the statistician in me while keeping my account team and our clients happy by predicting consistent campaign performance? The answer lies in statistical modeling and a theory known as the
Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution). Published by French mathematician Siméon-Denis Poisson in 1838, the Poisson distribution aims to predict the probability
of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently of the time since the last event. The work focused on certain random
variables that count, among other things, the number of discrete occurrences (sometimes called "arrivals") that take place during a time interval of given length (source: Wikipedia).
What does
this mean in online advertising terms? Let's say I'm trying to figure out how a specific demographic group will respond to a certain advertising message based on the limited data I have. Well, I can
build a statistical model based on the Poisson distribution assumption that allows us to sample users again and again from the initial campaign performance data, knowing their expectation and variance
of arrivals. That will enable me to predict with a high degree of certainty how this and other groups will respond, which can enable our account teams to better target campaigns for their advertisers.
So if in my test period, 23 users from a certain targeted group respond to my client's advertising message, my statistical model runs scenarios that enable me to predict with a high degree of
certainty how that same group of users will perform. And as history has taught us, statistical modeling using the Poisson distribution is more reliable in predicting which targeted groups will provide
the best campaign lift than merely relying on the initial campaign performance data.
In behavioral targeting, the greatest analytical challenge we face is figuring out how to achieve a campaign
lift quickly yet consistently, and with satisfying volume based on the initial campaign data. Statistical modeling and the Poisson distribution theory enable doing more and doing it better with less
data. So, the real beauty of good targeting is not just in the data -- but how you model it. Maybe those sexy statistician comments weren't too far off!