The question of data quality is becoming critical. Which datasets are useful and accurate, and which might lead us to false or misleading conclusions?
Scott McKinley, CEO of Truthset, has set out to clarify and define the term “quality data” and set the standard for its future usage.
“The world increasingly runs on person-level data,” he explains. “Marketing, advertising, personalization, dynamic pricing, and customer analytics all rely heavily on person-level data to power every interaction between a consumer facing business and its customers and prospects. But all that data has substantial error, which hurts performance and costs business money.”
This interview has been edited for length and clarity.
Charlene Weisler: What is your definition of quality data?
Scott McKinley: There are many aspects to data quality. We focus exclusively on the accuracy of record-level data. We know that accuracy is not binary, so we have developed a method to measure the likelihood a given key value pair is true on a spectrum of .00-.99.
advertisement
advertisement
As an example, if a data provider makes an assertion as to the gender of a specific ID, we can measure the likelihood of their assertion to be true. Higher likelihood of truth equates to quality.
Our mantra is that there is no perfect data set out there, and that even a good data set can have portions of it that are not good. Data sets can be measured at both the aggregate and record level to ensure that buyers can compare entire data sets, and users can pick and choose their own level of accuracy, because scale and price are still highly relevant in decision making about data purchases.
For data sellers/providers, we measure the accuracy of their data at the record level and provide both an absolute metric that the provider can use internally, and a relative index that can be used externally to help them differentiate in a sales environment.
Weisler: Does quality data vary by company, use, timing etc? How do you manage the changeable nature of what is actually quality data?
McKinley: We have created our Truthscore Index to give the market a relative look at how data providers are performing comparatively, and an easier way to say they are X% better than the average score. It is very much the same on the data buyer/marketer side. Everyone has their own threshold as to what level of data quality they will accept. Throughout the year, campaign-by-campaign, marketers choose different segments and at varying granularity or scale, and with each, they have to make individual decisions to address each appropriately, balancing scale, their budget, and now the quality of the data.
Weisler: What data sets do you process and vet?
McKinley: We have built what may be the largest cooperative of consumer data with leading data providers and we use that data to compile the most accurate view of demographic assignments for most of the US population.
Today, Truthset keys off hashed emails and the demographic attributes that describe those records. For example, a record would have my email address (hashed, of course) with “female, age 35-44, Hispanic, etc.” as the descriptors. We evaluate if each one of those attribute values is correctly assigned to that hashed email.
To do this, we work with multiple data providers and have them all contribute their weighted vote as to if that association is correct or not. The weighting in their vote comes from taking each data provider and comparing them to validation sets (these are our “truth sets”), giving each data provider (at each attribute value) a weighted vote.
All of the data providers come back together to vote, giving each record a Truthscore (0.0-1.0) value. We then offer out to the market an index of these Truthscores, data provider by data provider, attribute value by attribute value, to make the relative comparisons that we talked about earlier.
Weisler: How can quality data best be used by a company?
McKinley: In so many ways. The desire to use sets of consumer IDs combined with attributes (“Audiences”) is expanding as more companies rely on data to inform processes such as marketing and advertising, offering financial services, attracting tenants for real estate, or recruiting talent.
Another example is data enrichment. Many large enterprises acquire 3rd party demographic data to append to their CRM records. We have seen error rates of up to 40% in commodity demo data - even for the most common demographics. That error causes the enterprise to make incorrect conclusions about their customers and leads to waste in advertising, as target ID pools are built on incorrect data.
Weisler: How can you track the use of data through a process to see if at any point the data becomes compromised?
McKinley: Another great reason why Truthset was created. A number of us have been at companies that specialize in identity, or have used data science to transform data, or have bought data and inventory based upon one thing to learn that the measurement after the fact told us another thing had actually happened.
It’s not just the story about how data can be good or not-so-good — in fact, a number of data providers we work with have great data at scale, but when it goes through “the hops” the accuracy can be degraded (or improved in some cases!). Truthset believes that we should be inserted at every point of hoping to ensure that transparency in what occurred either improved or maintained the quality of the records handled.
Weisler: Should the industry have a standard for data quality and if so, how to implement and monitor? Or is it not possible given all of the walled gardens and silos?
McKinley: First, yes, in order to make things better, you have to measure and understand, as well as have ways to continue to improve. I
n our estimate, to be successful as a data quality measurement solution, you need to hit upon six key points:
Second, as we are driving more towards the consumer-privacy first ID space, even with it having the potential to be fragmented, with Truthset focusing on record level data, we can score whenever and wherever these IDs exist. It’s one of the reasons we chose to start with Hashed Emails.
Lastly, markets run better with measurement. When there is opacity and uncertainty in any market, there is friction. Standards bring transparency between buyers and sellers, and remove friction in the market so the market can grow faster.