Designing An Accurate Global A/B Test To Improve Campaign Optimization
For online marketers, the practice of global optimization involves making small changes to the volumes of impressions served on all, or almost all, ad placements with the goal of maximizing a target metric, such as gross profit or number of conversions.
Online marketers typically haven’t performed such tests at a macro level of optimization because they are difficult to design and properly administer. However, there are two good ways any online marketer can set up a global optimization test.
The ideal comparison test: The ideal way of performing a global optimization test would be to segment users into two equivalent groups by random selection and have one group see only the results of optimization method A, while the other group sees only the results of optimization method B.
Unfortunately, ad servers don’t allow for such a setup. The output of a global optimization test is a list of budget changes for all of the ad placements in the campaign. When implemented, these budget changes result in a greater or lesser number of impressions being shown for each of those placements. To support this ideal test, the ad server would have to show impressions at one rate to one group of users and another rate to the other. But sites are contracted to show a specific number of impressions for a given placement, not to show impressions to particular groups of users. This makes the user split test difficult to implement for direct buys.
However, this type of test can be conducted for an advertiser that spends its budget exclusively on real-time bidding (RTB) inventory. In this case, the optimization method will provide bidding guidance for user segments and the advertiser can bid, using these guidelines, on an impression-by-impression basis based on which segment a particular visitor is in.
Time split test: The second approach to running a global optimization test is the time split method. This test is appropriate for the more commonly occurring case, where at least some of the ad inventory is obtained through direct buys. The way to perform this test is to run method A for some period and then run method B for an equivalent period. The results can then be compared to determine lift.
There are important factors to note with this approach, including how different time periods, seasonal effects or other external factors can cause results to differ even in the absence of the optimization changes. Expanding the test can mitigate these potential problems.
First, choose a random group of visitors for a control group to be used throughout the test. The purpose of this group is to measure baseline performance and allow the measurement of seasonal or other external effects. The control group will not be shown ads during the entire test.
Next, to further alleviate the impact of interaction effects, consider one of these two approaches:
- Blackout period – Approach one is to include a blackout period between tests. The blackout period must be long enough so that all of the conversions from the first time period have come in before the second test is started.
- Double Test – Approach two is to run the test a second time in the opposite order. For example, if the optimization period is one month, then first run method A for one month, then show no ads for one month, then run method B for one month, then show no ads for one month, then run method B for one month, show no ads for one month, and finally run method A for one month.
While advertisers have been performing more simple A/B tests for years, many now realize that they need to improve their tests in order to effectively optimize global campaign spends and remain competitive. While global optimization tests require a bit more effort, the results can significantly improve return on ad spend (ROAS).