What are bots and what can they do?
In order to understand how ad fraud can mess up analytics we need to first understand a key ingredient that is common to all ad fraud -- bots.
Bots are simply automated browsers that bad guys can create and control by the millions to do exactly what they are programmed to do. For example bots can load Webpages and thus cause fake ad impressions to be created. Bots can fake mouse movements, page scrolling, and clicks and thus defeat most antifraud measurement companies that still think those are indicative of human users.
Bots cover their tracks by lying about who they are or where they come from. For example, they just declare a user agent string that mimics popular human browsers like Google Chrome or Microsoft Internet Explorer. And they pass fake variables like utm_source=espn.com, which get faithfully recorded in analytics as if it came from ESPN.com when it clearly didn’t.
Anything that is declared can be (easily) faked.
So if the bots are actively disguising themselves as Internet Explorer or Chrome, when you see high percentages of these browsers in your analytics, does that actually mean those are your users’ most popular browsers? It may be. But it may also be that you have a lot of bots. If you see referring sources coming from good, mainstream publishers like ESPN, does it actually mean that visitor came from ESPN? It may have.
But it may also have come from a bot passing a very simple fake variable.
So the key question to ask is what portion of your analytics is subject to bot activity and fraud? Once you find that out then you know how to clean your analytics to make it better. One other technique to use is to corroborate declared variables with detected parameters. For example, if the server side HTTP USER AGENT does not match the client-side detected user agent, something is wrong. Look out for these.
Anything too high or too low is suspicious and should be blocked.
Now if you understand bots can be tuned to visit Web pages and carry out specific actions on the page, you can easily understand that clicks, click through rates, time on site, pages per visit, bounce rates and most other quantity and quality measures in use today can be faked by bots.
For example, bots can be told to stick around on a page just long enough to tune the average bounce rates down to 5%. They can also be told to visit multiple pages during each visit to achieve high pages per visit -- e.g. 44 pages per visit. Bots can sit through three-minute video ads to completion, in order to earn the ad revenue for video ad completions.
If you insist on line item details in your analytics and look closely, you will clearly see what is fraudulent. Those 100% click through rates are certainly not real; they just indicate a greedy (and amateur) bad guy programmed the bot to click on everything. Any quantity metric (number of sessions, impressions, pages per visit, etc.) or even quality metrics (time on site, bounce rates, etc.) can be faked by bots. Be suspicious of anything too high or too low.
Messed up analytics may cause you to send more money to the bad guys.
Hopefully it is clear how bots can create clicks and tune clic-through rates (CTRs). If you are looking at analytics and optimizing for specific variables like CTRs you might actually be sending more money to the bad guys. Fraud sites have higher CTR because bots can tune the CTR to whatever you want to see.
Similarly, fraud sites have really high viewability. Why? They cheat, and stack all their ads above the fold, so they all register as 100% viewable, much higher (artificially) than good publisher sites. So again if you are optimizing for just one variable like viewability you may end up sending more dollars to fraud sites.
The moral of the story is don’t just optimize using a single variable; and better yet, optimize for business outcomes, not any of these quantity or quality metrics which can be (and are) easily faked by fraud bots.
Only good guys deploy measurement tech so what you are seeing is skewed.
Finally, many of the published numbers on fraud, viewability, bots, etc. are wrong, or at the very least not actionable. That is because they are skewed based on the sample they are measuring. It boils down to something extremely basic -- good guys are willing to deploy measurement; bad guys won’t install fraud detection SDKs, APIs, etc. because they don’t want to be measured.
So when you see a statement that fraud in mobile is low, is that because it is low? Or is it because we’re not able to measure it or because we are only measuring good mobile apps and not measuring any bad apps? When you see low ad blocking, is that because ad blocking is low or because you have a lot of bots (bots don't use ad blocking because they actually want the ads to load).
In conclusion, double check the heck out of everything.