Location Fraud: Highly Accurate, But Non-Human

Data descriptors like “accuracy” and “quality” have held the mobile ad spotlight for quite some time, while their corrupt counterpart -- fraudulent location data -- consistently circumvents the public eye. This creates a real problem when 100% accuracy or “high quality” becomes entirely irrelevant because an ad request is inherently fraudulent.

While assessing the accuracy of location data is necessary across the mobile ad ecosystem, ad requests aren’t worth analyzing unless the possibility of fraud has been eliminated.

Fraudulent location signals tend to exist within two main categories: accidental short distance jitter and intentional location spoofing. The first situation seems to occur accidentally and requires a scientific strategy for fraud detection. The second is shamelessly enabled by bad-guy publishers who corrupt mobile inventory to make money without an audience (or sell an audience that doesn’t exist).

Time to Face the Fraud



Short distance jitter -- which is the more complex, yet authentic version of fraud -- is when the distance between two location data sets gathered from a device is impossible to travel in the time between signals. This might be the case if one latitude and longitude data point registers at the top of the Empire State building one moment, near New York Penn Station 30 seconds later, and back at the Empire State building moments after that.

Location spoofing is a different animal. Mobile ad publishers are well aware that ad requests that include location data perform better in the real-time bidding process. In order to respond to requests at the speed the industry requires, publishers provide fake lat/long data points to increase their chances of winning bids to show a targeted ad, which in reality is not targeted. This can be achieved by centroid geocoding, such as providing a lat/long data point within an arbitrary ZIP code; or through what’s called IP geocoding, where free services provide a lat/long data point for an IP address, which is notoriously inaccurate.

While it’s easy to write off fraud as a rare occurrence and simply focus on accuracy when selling to media buyers, mobile ad providers that do so are at a disadvantage from the get-go. When examining the majority of mobile ad inventory (which includes lat/long data), two questions need to be answered: Are the requests authentic? (Originating from an actual device). And if so, is the location of the request accurate? (Near the actual device).

It’s important to realize that the question of accuracy relies on the existence or absence of fraud. Yet accuracy of location data is often determined separately from the filtration of fraud. This is problematic when you consider that roughly 16% of requests are fraudulent. And that’s just the beginning.

It has been suggested that only 34% of location data is accurate to within 100 meters of a mobile user's true location. However, this number assumes that the location data was authentic to begin with. Accuracy is only one piece of the location data puzzle.

New methods of weeding out misleading and inaccurate data are readily available and are becoming increasingly efficient in recovering good data and isolating fraud -- thus leaving mobile ad providers with much more than a third of reliable location data to offer clients.

So regardless of whether fraudulent location data is accidental or purposeful, on the rise or diminishing, it must be discussed alongside data accuracy, as well as actively prevented before it reaches the RTB stage.

Diagnosing Phony Location Data

The significant occurrence of fraudulent ad requests has led to the development of strategies that can reveal and remove these bad sources of data, as well as recognize and rid the mobile ad ecosystem of fraudulent data. One approach revolves around two indicators of true or fraudulent location data, based on how well it’s reflected in human movement within the physical world.

Measuring “hyperlocality” is one way to distinguish a publisher that provides good data from one who provides fraudulent lat/long data. This is accomplished by generating a representative sample of location histories provided by that data source and confirming whether those data sets are realistic. Authentic, hyperlocal data sets should match human behavior in tandem with physical geography and popular attractions. For example, multiple points over water or thick clusters on bridges are unrealistic. In other words, the density of points should mimic population density.

“Clusterability” is another tactic for discerning fraudulent lat/long data points depending on whether or not they’re truly associated with a location-enabled device.  Authentic data from a single device should cluster in understandable places, or dwells (school, work, home, grocery store), rather than registering across large distances over a short period of time. While devices can reflect random movement throughout the world, the clustering of thousands of ad requests in the middle of a field or signals emitted in a grid-like formation would be clear indicator of a fraudulent data source.  

What This Means for Mobile Advertising

The mobile advertising industry relies more heavily on impression rates than clicks, which continually triggers the ROI debate. By recognizing and removing fraudulent ad requests from mobile exchanges, the entire ad ecosystem drastically improves.

Essentially, the hard work is done. The complex task of developing a “fraud finder” is well underway -- so the ability to make the industry more efficient already exists. It’s time for “fraud” to be injected into the data accuracy conversation. If fraud is eliminated -- and accuracy verified -- then ad exchanges will ingest fewer fraudulent requests and spit out a higher percentage of relevant ads.  Remember that some mobile bot fraud appears to be very accurate, so while ensuring accuracy is necessary, it’s not sufficient alone in removing all bad ad requests. 

With a higher percentage of accurate and authentic ad requests, more mobile ads will be correctly targeted more effectively, and will produce a higher ROI.

Next story loading loading..