The Data Quality Imperative

Earlier this month, OMD’s Julie Fleischer and Neustar's Steven Wolfe Pereira poked at data integrity issues during a Chicago conference, creating some timely swirl by highlighting a simple truth: We lack data integrity standards. Advertisers are becoming skeptical of claims about data. We owe them some accountability, but they owe themselves some diligence as well.  

The enemy: oversimplification

People glom on to the difference between  “deterministic” and “probabilistic” as some sort of magical line in the sand, but the false sense of comfort is dangerous. Yes, facts exist (deterministic data), but riddle me this: If my credit card shows I shop at Whole Foods, am I rich? Any inference made from deterministic data sends us right back down a statistical rat hole.

Another schism divides planning and activation. I might know for sure that certain cookies clicked on a hotel ad. Planners may decide they want to contact that group. For television, there is no choice but to reduce that audience to a demographic.  For online, the exact people can be activated. This is one scenario that taps the value of tight integration between a data-management platform (DMP) and a demand-side platform (DSP).



Grim reality

The CEO of a large agency recently said to me, “Everyone tells me their data is great. How would I know?”  Indeed, how would he? Even high-quality data can be ruined by inappropriate use. Online contact usually depends on data, and the quality of the audience (as data) is just as important as the quality of the context. Data quality might be half of effectiveness.

It’s time for advertisers to hold themselves and their suppliers accountable for the quality of data and the conclusions derived from it. The risk of not getting this right will be the commoditization of data (and ergo, consumers). That stands, in my opinion, high on the list of strategic risks for the online ad industry.

So, by inferred popular demand, I present here, a listicle of quality attributes for advertising data.


If you are buying, say, “beauty category buyers,” it’s pretty safe to say that they are still interested in beauty after six months. But, if you are buying six-month-old “Auto Intenders,” there might be a pretty good chance they no longer need a car.

Veracity of the inference

How is the purported meaning related to the data? For example, if I went to a page that mentions the word “skin,” am I interested in skin cream?

Observation vs. declaration

Some data is derived by observing what people did (for example, cookie, panel). Other data is derived from what people said. Third-party data sites, which collect observations (see: ) are pretty good at nailing interests from Web site behaviors; demographics, not so much.

Conformance with actual intent

Say you bought a segment of people interested in adhesives (glue intenders!) for a $2.50 cpm, and lo and behold, its 100,000,000 browsers. To validate, ask some of them. What will they say? I like glue? I studied principles of adhesion in engineering school?

Proximity of use to source

This is segment “telephone” in the data supply chain: From data collector, to data aggregator, to DMP, to DSP, and maybe a Boolean “and,” on-ramping, de-duping, and domain-space resolution. Organically grown soybeans can end up as Cheez Whiz™. 

Likelihood of actual reach

There are several reasons a cookie may never create reach. The user may never show up in the footprint, or simple cookie deletion. You could buy 50 million users and only find half — or a tenth — of them. Time helps, but this is a serious impairment.  

Fit with actual prospect density

You might buy a segment of 20 million new pet owners, but are there really that many out there? Inflated estimates of prospect density are the first symptom of naïve hope.

Census vs. sample

Sampling is the lovable, wonky, heart of statistics. It’s all a gamble unless you are counting cards, in which case you have a census (i.e. not a sample). If your uncertainty lasts for over four hours, please call your data scientist.

Noise vs. signal

Data is dirty. We call it noise. It runs from 0 to 99%. It’s best to know.

So, there’s a thought starter: Data is not magic. There is good data and horrible data. Much depends on how it is applied. It’s not all that esoteric. A little common sense can go a long way.

4 comments about "The Data Quality Imperative".
Check to receive email when comments are posted.
  1. Ken Mallon from Ken Mallon Advisory Services, October 13, 2016 at 3:40 p.m.

    Agree. People need to use common sense. And, I encourage people to test things live and do A/B evaluations. Measure based on things that matter -- like brand impact or sales impact. Buy a data segment and see if it generates sales lift. That's the ultimate way to compare two data sources.

  2. Charlie Tarzian from The Big Willow, October 13, 2016 at 3:50 p.m.

    The reason for this is simple:


    3rd oarty data providers have never been transparent about defining what their data means.  So as it relates to intent-based data - vendors use a black box approach where they welcome players into their coop and have no quality statements or definitional mandates.

    Until we standardize definitions of classification and are willing to expose the provenence of data - we will continue to have this discussion.

  3. Ted Mcconnell from Independent Consultant replied, October 13, 2016 at 5:28 p.m.

    Amen. + "Provenence" ... beautiful word choice. :). t.  

  4. Ali Shah from ADfits, October 14, 2016 at 8:04 a.m.

    Completely agree. Data can be spun to accommodate any argument. Rather than inferring this data from cookies, it's high time we empower consumers with control over their intents and interests. This is our priority and mission at

Next story loading loading..