Commentary

For Cross-Device ID-ing At Scale, Probabilistic Only Game in Town

  • by , Op-Ed Contributor, December 17, 2015
As the market has come to realize the importance of communicating with, personalizing experiences for, and analyzing the behavior of actual consumers — as opposed to disparate devices — the question of whether to use device maps based on deterministic or probabilistic matching often arises.

The very conversation implies that there is a choice involved, while in reality there is none. If a company wants to understand which particular devices are being used by consumers on a large scale, probabilistic matching is really the only option.

Deterministic matching determines the relationship between several connected devices (like smartphones, tablets and laptops), based on a person’s unique login credentials for a particular service.  

Probabilistic matching accomplishes the same goal without relying on explicit login information, instead leveraging principles in data science and machine learning to analyze vast amounts of device activity data to determine which multiple devices belong to an individual consumer. By observing the activity patterns of billions of different devices, probabilistic solutions can accurately determine which devices belong to the same person.

Here’s why, for cross-device mapping at scale, probabilistic solutions are the only practical option:

The total market reach of deterministic matching data pales in comparison. To create deterministic device maps, companies need to track users who log in with the same account credentials using different devices, an approach that inherently excludes large swaths of the market.

This is true of even the largest companies, such as Google and Facebook, both because not everyone uses these services and because many users always log in using a single device. The end result is that deterministic matching solutions only cover a small percentage of the overall market, while probabilistic solutions can deliver incredibly high recall by matching devices across different verticals. For these reasons, even the Internet giants resort to probabilistic matching to close the gap.
    
No company has come close to delivering scale by trying to aggregate third-party data. One way to increase the scale of deterministic matching solutions is by trying to aggregate the first-party login data of multiple platforms, thus combining the login-based matching of multiple companies into a single repository. However, even the largest of these aggregators has not been able to achieve significant scale. For most applications, the ability to match 3 million to 5 million users across devices is not very useful, as compared with probabilistic solutions that can do an excellent job matching hundreds of millions or billions of users.

The owners of large deterministic data “walled gardens” are not sharing their data. Even if the recall of deterministic cross-device identification solutions owned by Google, Facebook or Twitter is sufficient, these companies don’t license this data to third parties, now or in the future. The proprietary advantage it gives them is too great, and the intensive privacy concerns that surround deterministic device maps too grave.

An advertiser wishing to target a particular customer segment or engage in retargeting across the devices known to these companies’ ecosystems can do so. However, the vast majority of would-be users want the data itself, not just access to third-party advertising segments. Advertisers and publishers who want to use this cross-device data in their own systems or on external platforms for cross-device analytics, conversion attribution, customer journey tracking or personalization are simply out of luck.
    
The scale of deterministic device matching solutions is very limited, and will remain so. There is a perception that deterministic matching is more accurate than probabilistic matching because it’s based on explicit indications that two devices are being used by the user. But even this basic belief isn’t true, as these “explicit indicators” are often simply wrong, leading to inaccurate results that don’t offer scale in the first place.

For companies wishing to leverage device-matching data within their own systems for a variety of potential applications, as opposed to simply targeting audience segments via Google or Facebook, deterministic solutions are no match for probabilistic ones. There really is no debate for companies looking to leverage cross-device identification data to improve business performance. Probabilistic data is the only viable approach for matching consumer devices at scale.

Next story loading loading..