How To Hold Your Data Provider Accountable

Raising the industry standard on data quality and accuracy requires asking and answering the right questions. I spoke recently with Manish Ahuja, chief product officer at Qualia, about the state of data and how marketers and their agencies can hold data providers accountable.

How prevalent is the delivery of inaccurate data?
We’ve seen up to a 60% error rate in the cross-device space.

What can a buyer do about that?
The best way for buyers to trust their data is to look at a graph and understand how many devices are tied to a consumer and/or a household. Is that believable based on their understanding and market research? Second, they can use their reliable deterministic data — either online or offline — to validate the graph.

For example, if I return a graph of site visitors to an auto advertiser and have 150 million consumers represented, without much validation, they should know that in the U.S. there aren't 150 million people or 300 million devices that are coming to their site in the last 90 days. It’s just not practical!



How else do you get at accuracy?
All the critical questions you pose to your solution provider will relate to occurrence, concurrence and persistence.

What is a red flag?
Accuracy cannot come down to one number fixed in time.

This is where “occurrence” enters the picture. When you are presented with a single number, “We are 95% accurate!” it’s important to realize (or have someone honestly tell you) that this figure is representative of a subset of your graph (match rate — rate at which the provider is able to map your ID in their graph) and is exactly at a moment in time. 

Algorithms are at play; time is a factor. Therefore, you must define and analyze timeframes. The question to ask is: What is the timeline? 

What is occurrence?
Frequent occurrence is a critical part of any cross-screen platform that claims to analyze inputs, or group data, across screens. Without frequent occurrence, numbers that attempt to capture associations are not reliable because they cannot possibly be accurate. “Rare” signals are problematic. This is a key principle on which to engage your solution provider.

What is concurrence?
This is defined at the platform level. Simply put, associated devices must show up together within a small window.

Concurrent sighting of devices in a narrowly defined scope is the only valid way to associate devices with confidence. One thing that happens within a badly run model is devices may be inappropriately added to a group when they don’t exist with the same devices in another group.

As an example, I was at a party at my neighbor’s two months ago and hopped on their WiFi for a few minutes. This should result in low levels of confidence.

What is ID persistence?
That is the ability to steadily glean and never lose track of a given identifier, representing a device, across the data ecosystem. All measures of accuracy must take relative ID persistency into consideration to be valid. As we know in the marketing ecosystem, not all IDs are created equal.

Cookies behave differently than mobile identifiers; behave differently between types of devices and/or browsers. Reliable and persistent identifiers are critical in creating a confident association between devices for a long period of time.

What else do you need to ask about?
As a quick tip, when a potential solution provider is talking about error rates, listen closely, for not all errors are created equal. For example “False Positive” — associating a device to the incorrect entity (consumer or household) — is far worse than “False Negative,” creating a new entity for a device that should have been associated to an existing entity.

If you assign my device to a 65-year-old woman living across the country, it significantly pollutes any data/model tied to that group as opposed to creating a new “consumer” for my device.

Next story loading loading..