Commentary

Boo! 'Ghost Sites' Demystified Using A Predictive Approach

Boo! It’s Halloween, so RTBlog was looking for some suitable material to spook readers. Maybe this will do the trick!

Ever wonder what “ghost towns” are? Having traveled recently in some of the most desolate areas for the West, RTBlog became intrigued with ghost towns.

But the kind of ghost towns we’re talking about here are not the kind with sagebrush and tumbleweeds. In this case, ghost sites are designed to look real to both consumers and advertisers. However, from the perspective of a demand-side platform (DSP), a ghost site appears to drive a lot of traffic and have a large number of users. But the reality is that these sites are actually ghost towns, the traffic is fake, and the “towns” have been created by fraudsters as vehicles to monetize traffic that advertisers don’t want or need.

Here’s the hook: YuMe’s Director of Brand Safety and Traffic Quality Eric Bozinny has proposed using a more predictive approach to identify ghost towns and rid them of goblins, ghouls -- and most importantly, fraudsters!

RTBlog asked Bozinny a few questions:

RTBlog: What exactly are ghost sites?

Bozinny: Ghost sites are designed to look real to the eyes of advertisers. And from the DSP’s perspective, a ghost site looks like it is driving a substantial amount of traffic and has a large number of users. But in reality, these sites are ghost towns.

To the trained expert, the traffic is suspicious -- and with good reason, because these ghost towns are vehicles created to monetize traffic that advertisers don’t want or need. Ghost sites are created and managed by the same fraudsters who have developed the bots that create non-human impressions. Bot network operators are signaling to thousands of computers under their control to visit their ghost sites. Some individual sites are raking in millions from unsuspecting advertisers.

RTBlog: How do you identify “ghost town” sites?

Bozinny: There are several steps I suggest that ad operations teams and advertisers take when assessing inventory and publisher quality. First, it’s important to look for bland and generic content. There are content farms that churn out articles for as little as $1 each, so it’s no surprise that when I search to see if ghost site content is found elsewhere on the Internet, I typically draw a blank. However, most ghost sites offer ridiculously useless content on "how-to" topics such as cooking, gardening, finance, and home improvement.

You can check the "About Us & Contact Us" pages. They’re good indicators as to whether the sites are ghost sites. Ghost sites will never have fully verifiable contact information or traceable provenance. An "About Us’ page will have lots of flowery language that may not carry coherent sentences, which is an indicator that the source of the site is unknown. Plus, it gives no further information about the site operators. The "Contact Us" pages will more than likely have no more than a generic Web contact form.

RTBlog: What are some other indicators?

Bozinny: Another easy-to-identify red flag is whether the site has a real physical address. Many trading desks and networks require a physical address on sites that they transact with. This is a requirement driven by the idea that a fraudster won’t have a legitimate address to use. Ghost sites will often include some type of address on these sites that doesn’t correspond to an actual location. It’s simple enough to use tools such as the Whitepages to determine a person or business behind it. Keep in mind that fraudsters might use shared office space addresses because these companies don’t share the information of those using the space.

You should also always look up domain names. By looking up the domain registration information using tools such as WhoIs, Alexa, and SimilarWeb, you can find out how long the domain has been active. Be on the lookout for young domains. Fraudsters move quickly once they’re up and running. After all, there’s no need to build an audience of real humans!

RTBlog: What are the benefits of a predictive approach to fraud?

Bozinny: In the ad-tech world, when it comes to combating fraud, being proactive and predictive is crucial in developing trust between advertisers, technology partners, and consumers. The best solution is one that eliminates serving ads to potentially fraudulent sites.

The simplest way to do this is to leverage data science to identify predictive signals, such as known fraudulent IP addresses, specific user agents, and browser types to create algorithms that pass through human traffic and red flag traffic that doesn’t appear to have human characteristics. For example, a spike in traffic at 2 a.m. from a large, diversified set of residential IP addresses would likely be flagged as bot traffic.

The value of this approach is that it minimizes the need for post-campaign reconciliation. You simply don’t pay for non-human impressions in the first place.

RTBlog: What methods of identification aren't as effective?

Bozinny: Other approaches to fraud detection are often heuristic, which is good, but not great. It means that a system relies on trial and error to learn and improve, and the sophisticated fraudsters are often able to stay a step ahead of this detection. Heuristic detection is largely built on a probabilistic approach where data scientists can tweak their algorithms if need be to eliminate more sites, or fewer sites, depending on the campaign’s requirements. So you might have a situation that’s built on an educated guess. This may result in a system that may be too subjective, and thus less accurate. 

So this Halloween, and beyond, don’t be spooked by ghost towns on the Web! But it’s okay to be spooked by ghost towns offline, in real life!

Next story loading loading..