Data Janitors: Sanitization Engineers Of Digital Marketing

When Samuel Johnson released his Dictionary of the English Language in 1755, it became the first reliable point-of-reference for a language that had already been spoken widely for hundreds of years. There had been dozens of dictionaries published in the centuries prior, but it took that long to develop a trusted, comprehensive standard.

In a way, today we’re on the hunt for the Samuel Johnson of digital marketing data. In a dynamic industry like ours, organizations speak a babel of data dialects. There are varying formats for timestamps, locations, devices, and other principal attributes in the digital ecosystem. To boot, the standards that govern media buying, creative delivery and its measurement -- such as OpenRTB, MRAID, and various viewability guidelines -- are rapidly evolving.

The result is that every party in the media supply chain -- publishers, exchanges, networks, agencies, and marketers -- struggles to combine the data exhaust generated from different channels and partners into a clean, comprehensive view.



As Marc Pritchard of P&G recently quipped at the Interactive Advertising Bureau’s Annual Leadership Meeting:  “It would be like each NFL football team having a different standard of yards needed for a first down… How could we possibly have different first down standards for each NFL team? How would we know who is playing better during each game? How could we possibly compare statistics across teams?”

We need two things: 1) data standards, and 2) data janitors that can help implement the right software tools to clean, structure, and organize data to meet those standards. With this foundation in place, the industry will be well-positioned to apply more intelligent algorithms to media-buying activities.

A lack of these elements makes the supply chain vulnerable to fraud. Many issues we face are not crimes of commission, but of omission and negligence. And one of the reasons firms are wary to adopt standards is that doing so requires a major investment to even understand what is flowing through their pipes. So many companies hoping to make sense of the data exhaust coming from their programmatic marketing activities are struggling with uncertainty about whether the data they take in will be clean enough for proper analysis.

Programmatic markets are extremely chaotic, and the gains to be had from clean, reliable data still far outweigh the value of advanced predictive algorithms at this stage. Despite all the focus put on AI, machine learning and advanced algorithms to help with final analysis and visualization, there is very little investment devoted to the basic tenors of data analysis:  putting people and technology in place to process, structure and clean their data.

You’ll often find companies who hire dozens of analysts and implement generic software to compensate for dealing with bad data. Analysts often arrive expecting to implement advanced analytics, only to find themselves doing nothing more than simple math on top of disparate data sets haphazardly collected by previous regimes.

Case in point: for a vast majority of companies, inefficiencies are driven by tag errors. Configuration errors between programmatic buyers and sellers can account for double-digit percentage inefficiencies. These can be attributed directly to a lack of clean data being emitted or processed.

This is my advice to companies looking to grow their programmatic business: Unclean data does not scale. Companies that haven’t taken control of their data at small scale will be unable to grapple with them at large scale.

The best way to do this is by giving more love to the “data janitors” in your pipeline. These are the folks involved in ensuring all labels coming in can be mapped to internal dictionaries. They make sure publisher names and other key data points map consistently to IDs to create robust identifiers in the system. They are the sanitation engineers that throw out junk data to ensure the high integrity of what remains.

We may not get a universally trusted data “dictionary,” but with dedicated investments in people and software that ensures we keep data clean, marketers can do a much better job of properly laying the groundwork for an ever-expanding infrastructure of advanced technologies being built on top of them.

Next story loading loading..