As marketers, we hear about Big Data all the time. We can make use of it by looking at four key data sources.
Internal data comes from inside the company and includes monetary transactions. Most companies have sales and marketing data warehouses that store their internal data. This data drives the bulk of marketing decisions.
Second-party data is paid for by the enterprise, but often managed by a second party. It includes Web site logs, shipping records, help-desk tickets, social media reviews, Facebook posts, and Twitter streams.
Marketers underuse second-party data, so it is important to consider insights that can be gleaned from it:
Third-party or directly attributable data is packaged data, such as Dun & Bradstreet credit reports. This data can enrich impressions from second-party data. Using such data, we could refine our behavior analysis: “People making $55,000 to $90,000 in the Northeast with credit scores above 600 respond better to 20% discounts than those in the next-highest income bracket with credit scores above 650.”
Public domain or aggregated data is publicly available information from institutions like the Bureau of Labor Statistics or the Census. This data can further improve the business context -- what conditions outside of the company’s direct control could affect business decisions? Is an area on an economic upswing?
With this data, we can make the statement even more precise: “People in ZIP code 01812 making $55,000 to $90,000 take out 30% more mortgages worth $500,000 or above than those in ZIP code 01811. Employment is higher and so are credit scores.We should open our new hardware store in 01812 running 20% discounts over the nearest competitor, and not spend a lot to target customers in 01811.”
Closer to data utopia
Big data can build a more accurate picture of our customers, but it doesn’t happen automatically. We could try to purchase all the data we can, but our funds (and our time) are limited. How do we know when to stop spending? How can we find the signals that point the way?
Start by working backwards from Utopia: the unattainable condition where we know with 100% certainty what, when, and where customers will buy. Even 90% is worth something. Picture the four sources of data, from internal to aggregated data, and begin to build a representative model.
Not all data has the same information content. Generally, the farther the data is from a transaction, the more “noise” (data you don’t need) than “signal” (data that can lead to effective action). We shouldn’t ignore second-, third-party or aggregated data, but we should assess its value.
Finding value in our data
To find out which data provides the most value, we must examine data sources critically. We should think about how it could help us and even try to monetize its potential value versus how much work it will take to get that value. This is why it’s important to have an idea of what it takes to get closer to Utopia, and what would be possible if we reached it.
One way to get meaning out of data is through machine-learning techniques. Machine learning takes the virtuoso instincts and pattern-recognition capabilities of humans and applies them to noisy, incomplete, and probabilistic datasets.
There’s no way to beat a path straight to Utopia, but we can improve the variety of data we collect -- without drowning in the volume. Although information density decreases as we get farther away from transactions, we should not hesitate to explore other kinds of data. With today’s machine-learning techniques, we can adapt our approach as we find the signals that are most valuable to us.