Statistics, Lies, And Mathematical Literacy

We have all heard the maxim, "There are three kinds of lies: lies, damned lies, and statistics." Darrell Huff wrote "How to Lie with Statistics" in 1954. Since then, his book has sold more copies than any other text on statistics. I recently flipped through this book and remembered why it is so good. First, it's funny. Huff was a writer and editor for Better Homes and Gardens, among other publications. He used this vantage point to pick example after example of lies told in the media through the misuse of statistics. Huff uses these lies -- some of which were intentional, while others were unintentional -- to teach complex statistical concepts in a way that is both entertaining and practical. Second, it's still relevant!

In my recent skim of this work, I was confronted with several examples of things I run across regularly in email marketing. Here are three modern-day examples of the principles Huff articulated over 50 years ago:

1. The Well Chosen Average (Chapter 2)"This is the essential beauty in doing your lying with statistics. [Multiple] figures are legitimate averages..."

Email marketers love averages. What's the average open rate? Average click-through rate? Conversion rate? Few topics are more frequently, or emotionally, addressed in our field. The reality is, email marketers consistently lie -- intentionally and unintentionally -- about these figures.

Loren McDonald did a good job of highlighting some common issues last week. He has merely scratched the surface -- lies in this arena are pervasive. Consider surveys that ask marketers seemingly benign questions like, "What is the average open rate of your email campaigns?"

Let's assume the marketer sends 5 campaigns a week, one general newsletter and four highly targeted mailings. The newsletter goes to 100,000 subscribers a week gets a 10% open rate. But the four targeted mailings each go to 100 subscribers and get a 50% open rate. What is the average open rate? It depends on how the marketer chooses to calculate that average. 42% [(10% + (4 x 50%) / 5] and 10.2% [(100,000 x 10% + (4 x (100 x 50%))) / 5] are both legitimate answers. In my experience, the first answer (42%) is easier to calculate and, thus, the way most marketers answer this question. It's not intentionally dishonest, but it's not the whole truth, either.

2. Much Ado About Practically Nothing (Chapter 4)"Sometimes a big ado is made about a difference that is mathematically real and demonstrable but so tiny as to have no importance. This is in defiance of the fine old saying that a difference is a difference only if it makes a difference."

This applies directly to common questions about sample sizes and statistical significance. "How large should the test groups be to get statistically significant results?" or "Is that really a large enough sample to yield significant results?"

I have come across several instances where statistically significant observations have been discarded because the sample size was not large enough. The statistics were sound, but the fact that insights were drawn from seemingly small samples made executives uneasy, "We really need to test with more to make sure these results are significant." Similarly, I have seen sweeping changes made based on the "statistically significant" results from tests based on huge samples.There is an inverse relationship at work: the larger the sample, the smaller the differences that can be detected in statistical tests. Large samples are more likely to result in the detection of real mathematical differences, but they may detect differences that are not large enough to have a meaningful impact on the bottom line. The challenge is in finding the right balance.

There are other chapters in Huff's book that discuss sampling issues -- and there are several to consider. However, assuming that one follows all the appropriate rules, size is not always a virtue in tests of statistical significance. Statistical significance is simply a mathematical test of whether or not there is a difference between two or more options. Common misconceptions about this seem to stem from the word "significance." In statistics, "significance" has nothing to do with importance.

3. The Gee Whiz Graph (Chapter 5)"... suppose you want to win an argument, shock a reader, move him into action, sell him something. For that, this [honest] chart lacks schmaltz. Chop off the bottom."

Nothing drives me crazy like a distorted graph. Graphs are the lifeblood of email marketing. Whether the graph is intended as the "Hey, look, I made stuff better" or of the "Whoa, here is something you really need to be scared of" variety makes no difference. If the graph chops off the bottom, you should be skeptical.

Consider, say, over the past year a marketer has increased the revenue generated through his email program by 10% from \$100K per year to \$110K per year. On a graph, this will look boring. However, if the chart is altered so that the x-axis starts at \$90K, the same chart starts to look exciting. That 10% increase looks a lot bigger. It's an old trick, but it is still used on a regular basis. I've used it and I'll bet you have, too! Don't be fooled; ask questions!

In any given day, email marketers are faced with a slew of statistics. Comparisons are made, studies are presented, and lies are made! "How to Lie with Statistics" provides a comprehensive overview of statistics as well as insights required to appropriately express your statistical skepticism. Let us arm ourselves against these lies and become better email marketers as a result.

Disclaimer: I have no interest, financial or otherwise, in the sale of this book.