What You See Is What You Get: Applying Contextual Targeting in a Video Environment

For many marketers, Google’s delay in ending its support of third-party cookies has been met with inertia. Rather than moving ahead to embrace a world where the sort of behavioral data they’ve been accustomed to is simply not available, they’ve either bided their time using their existing cookie-based solutions or have sought to recreate cookie-like behavioral research in a fashion that is likely to prove problematic when cookies finally disappear.

As we discussed in an earlier blog post, some marketers have found an alternative to third-party data by updating an older but tried-and-true approach for the modern age: contextual targeting. What made this work, initially, was the ability to focus on relevant keywords, which helped these marketers determine text-based editorial environments where consumers interested in their messages were likely to be found. But in any given instance, there were too many keywords; identifying them all proved cumbersome, time-consuming, and not scalable. A better approach, as defined by 4D, an outcomes-oriented division of the data and digital transformation company Silverbullet, was to take a higher-level view and, rather than attempt to compile an encyclopedic collection of keywords, create a more limited—but more targeted—group of “Topics.”

Why Video is Different

While the contextual approach is easy to conceptualize when it comes to targeting text-based content, it is clearly more nuanced when the content is on video. And that’s a growing challenge since, as Gadi Baram, a product manager at 4D, explains, “video is everywhere you look. It’s on every platform, on every device. Even traditional vehicles like newspapers are now leaning on video content.”

While the contextual approach is easy to conceptualize when it comes to text-based content, it is more nuanced when the content is on video.

What makes determining video context difficult, as Baram puts it, is that “it’s not as simple as fetching the words from a page.” Speech and the written word are not defined in the same way. “Many companies,” he says, “are trying to use only page metadata like title and descriptions and force their search for context onto video instead of working within the video content.” And that is leading to less satisfactory results.

A better route, Baram says, is to try “to understand what the video is about, looking for more depth in the content.” What’s critical here, Baram believes, is understanding the contextual signals that come from the images in the video, focusing on pictures not just on audio, using a vector analysis to get in-depth insight, looking at the video frame by frame, and in the process, getting more granular.

Training the Machines

The key to making this work is machine learning. However, Baram notes, the machines need to be trained to understand what they’re seeing and to communicate it back. “We initially started combining different models,” he says, “breaking videos down into individual images and using image recognition technology. But we came to realize that computers and humans don’t think the same way. Computers think in binary terms,” and this opens the door not only to missing nuance but even to misinterpretation. 

To counter this, Baram says, “We needed to define the different topics the way we see them so that the computers could better understand what they were finding and then classify the videos.” But that was neither straightforward nor easy. “If, for example, you ask 10 people to define ‘automotive,’ you’ll get 10 different answers,” he says. “We had to figure out what definition was more commonly used, how the masses would view a typical topic, and then understand what would move the needle for advertisers.”

“We needed to define the different topics the way we see them so that the computers could better understand what they were finding.”

At the same time, though, 4D’s team of data scientists was looking for nuance, but “in a more compliant, privacy-based, machine-learning way,” drilling down to get as granular as possible. So, for example, a video title box might make it clear that a video is covering sports and, Baram says, “you might see a few players on the pitch and even be able to tell what sport they’re playing.” But what makes for a valuable context connection, what provides valuable insights into the video, what helps you reach the right audience is knowing that the topic is not just sports or even soccer, but that the video is covering a specific event, such as the FIFA World Cup. Connecting an ad to that context will provide more impressions because, as Baram notes, “more people are tuning in to this than to a regular match.”

Training the machine-learning technology meant providing a subset of videos and then feeding new videos to the models to see how they improved and then to understand the impact this training was having on performance as the models sought to deduce the topics the videos were covering.

Bridging the Gaps

Throughout this process, Baram stresses, the search for context needs to be both device and platform agnostic. “It’s not as critical to understand where the content is presented,” he says. That’s up to the DSPs, who have the tools to bridge the gaps between platforms and devices. “Instead, we’re looking to provide the contextual environment with a pool of audiences, without knowing the audience per se.”

Given the current confluence of more video, less third-party data, and an increasing reliance on context as a way to find audiences, the better a brand can zero in on a video’s contextually relevant content, the more likely it is to find consumers who will find its ads relevant. Machine learning is clearly a useful tool, but as with all artificial intelligence, it’s only as good as those who are helping it learn, those who are guiding it to understand where the context lies and where the audience can be found. Taking advantage of the revolutionary impact that machine learning can have on the industry requires, Baram notes, that marketers not wait until the last minute to find a replacement for third-party behavioral data but educate themselves now on the alternatives and the best paths forward.

In the next installment of this series, we’ll take the understanding of Topics-oriented contextual targeting—for both text and video—and apply it to what marketers are looking for.  And in a MediaPost webinar scheduled for June 28, we’ll look at how marketers are getting past their initial hesitations about contextual targeting.

Discover Our Publications