Fishing In The Data Lake: Can You Catch Email Identifiers?

Let’s say you’re working with a legacy data warehouse. Can you reach customers across all their devices, based on identifiers such as email addresses, cookies, IP addresses, home addresses, device identifiers and geolocation data?

Maybe not. And even if you can, it is difficult to persistently and accurately connect them. But it can be done by switching to a data lake, according to Demystifying Data Lakes, a white paper released Wednesday by Viant.

Just to be clear, you can view this paper as a pitch for data lakes, particularly the one offered by Viant. But it does offer tips to prevent you from drowning in a lake.  

For instance, does your data lake platform provide easy access to email data?

“One of the most important customer data points is email, which enables brands to bridge the offline world with digital,” the paper notes.

But beware: “If you do not have access to emails or other registration data such as mailing addresses, it is an important consideration when selecting your data lake platform, as these links unlock insights into how campaigns drove sales (both online and offline).”



In addition, it's important to determine whether the data provides integrated device graphs with first-party data on consumers.

“If you are a brand that does not have access to data like email addresses from your customers, it is imperative that the platform you select has an integrated device graph – or even better, an identity graph – as well as first- party and third-party data that you can tap into,” it contends. 

Meanwhile, your customer database, or CRM, can act as an anchor that allows data points like “email, IP address, device ID and purchase behaviors to be deterministically tied to real people,” the report continues. “In a data lake, this is the crucial link that enables marketers to tie disparate data sources together without the need for probabilistic algorithms.”

We don’t take any position on data lakes — marketers have achieved dramatic results with data warehouses. But Viant argues that by using data warehouses or DMPs, “brands have had to structure their data, or organize it into columns and rows.”

Here are some of the differences between the three major platforms. 

For one, data lakes are both structured and unstructured, providing more flexibility, whereas DMPs and data warehouses are structured. 

For another, DMPs have limited flexibility and warehouses have a fixed configuration that requires data engineering for large changes. In contrast, data lakes are flexible and configure data as needed. 

Finally, data lakes are designed for low cost. So are DMPs, but fees can drive the cost up when you’re using one. And data warehouses can be expensive.

Who uses data lakes? Data professionals and/or data scientists. Warehouses are utilized by data professionals and advertising professionals. 

In making its final case, the paper notes that data lakes can also help with targeting, attribution and achieving operational efficiencies. 

We close the case. It’s up to you to decide if it pays to use a data lake, and if it can be integrated into your stack. 


Next story loading loading..