Commentary

Is Deterministic Data Better Than Probabilistic Data? Maybe Not

by Sree Nagarajan , Op-Ed Contributor, October 19, 2017

When you consider the primary appeal of digital marketing -- being able to measure, in real time, how your advertising is performing -- using deterministic data seems like the only reasonable choice.

Deterministic data consists of information proven to belong to a specific person or household, such as purchase data that comes from a store loyalty card. After all, it makes sense for a campaign manager to measure conversions against a person 100% certain to be in-market for your product or service, or to have purchased it in the past.

Perhaps this is the reason why probabilistic data -- which is modeled based on how likely it is that someone will be receptive to a product or service -- gets such short shrift. This lack of appreciation has been widespread, but is best summarized in this quote from an Ad Age column: “While predictive advertising and probabilistic data are arguably more effective than most retargeting, they are still synthetic approximations for real answers.”

The problem with this statement, though, is that it unfairly projects the problems of deterministic data. The truth is that, despite it being based on actual purchase or usage, deterministic data has its own difficulties:

Marketing is one-to-one, but data is often not. The best example of this is set-top-box data, no matter which provider. While it is helpful for networks to have an idea of how their content performs in households, the usefulness of this data for marketers is dulled at the individual level. With an average U.S. household size of 2.58 persons, how is any marketer supposed to say with certainty that they’ve reached the right person? This could work for products or services that serve a household, but not for anything that is more personal.

There are some places it just cannot go. One of the most buzzworthy topics in the industry is the presence of so-called “walled gardens,” defined by the Association of National Advertisers as “a platform where the carrier or service provider… restricts convenient access.”

Regardless of where you stand on the issue of whether these platforms should break down their walled gardens, until they do, the data inside and its potential insights remain a mystery. Simply put, because it’s not shared, walled data cannot be used, regardless of modeling methodology.

There’s no such thing as “un-modeled” data. The problems with set-top-box data often extend to purchase data. But if a marketer is fortunate enough to receive data based on actual transactions, all is well, right?

Not exactly. If you read the descriptions of companies selling purchase data, you’ll often see the words “actual purchases” alongside phrases like “tied to” or “representing” x million households. This is a subtle nod to the fact that modeling is being done behind the scenes. To be clear, it’s being done for legitimate reasons, whether it’s to protect consumer privacy or for legal compliance, but it’s still being done.

Deterministic data’s flaws aside, dismissing probabilistic data solutions is done at one’s own peril. After all, there are parts of the consumer journey for which little to no data exists. Most stages between awareness and intent are not measured. The consumer journey is rarely linear, but it's still better to have an approximation of that journey as opposed to ignoring it altogether.

Both probabilistic and deterministic data are needed to achieve the results that each are uniquely designed to deliver. It's crucial to remember that no data set is perfect. Just as in a relationship, it’s more important to find which data set is right for you and your campaign objectives.

data, data management, data mining, data-targeting

Next story loading

About the Author

SREE NAGARAJAN, CEO, Affinity Answers