Data has real monetary value. Many publishers are beginning to realize that their audience asset can be monetized beyond the few minutes a consumer spends on their site. However, wherever there is
real monetary value, various forms of skimming and cheating may follow. As a result, many data users are trying very hard to get access to this data without having to pay for it In other words, it
should be called "data with benefits." As an industry, it's vitally important that we work hard to find these types of abuse and eliminate them.
To help the industry stay vigilant, here are
five of the more popular skimming scenarios to look out for:
1. The "George Costanza": History sniffing involves using scripting to look for the color of a link displayed in the browser.
The color tells you whether the user visited a URL in the past or not. A couple of companies have used this as a way of judging data quality (which is in and of itself not a bad intention). However,
the practice has major issues because it circumvents a consumer's notice mechanism where the individual website discloses to the consumer its data sharing practices -- yet the data sharing occurs
unbeknownst to the user via another site. This practice is also annoying to the publisher that generates the data because it circumvents that publisher's ability to get paid. Avoid this practice at
all costs. History sniffing shall hence be known as the George "should I have not done that?" Costanza tactic.
2. The "George Michael": Usage skimming occurs when a network or DSP
agrees to pay a data provider for every impression served targeting the data. However, after they target the data once, they then consider the data "theirs" because they claim to capture data from
their own ad serving. Hence they never pay the data provider for more than a single impression. Clearly this wasn't the intent of the data provider trying to monetize their site data. Usage
skimming will be named in reference to lyrics from a George Michael song ("Praying for Time"): "What's mine is mine, not yours."
3. The "Woody Allen": Fake revenue share occurs when a
buyer ends up paying the least amount possible for a data attribute that comes from multiple sources. Let's say a data buyer acquires a range of third-party data that varies in quality from
difference sources. One provides geographic info that comes from self-declaration: "I live in NYC." The other source provides geographic data based on IP address (NYC). Self-declared data in this
case is valued higher because of the accuracy of the data. A form of fake revenue share is when the data buyer only pays the cheaper of the two sources for this data. We name this one after Woody
Allen's character in "Anything Else": "Never trust a guy who fumbles for the check."
4. The "Vanilla Ice," otherwise known as Look-a-Likes without credit: This occurs when a data buyer is
building a look-alike model and they use your data as the seed for their look-a-like model. However, they give you no economic reward for that seed. They can justify this by claiming that they
targeted cookies outside your cookie pool and only used your data to build a model. No credit look-alikes are named in honor of Vanilla Ice, whose #1 hit "Ice Ice Baby" included an altered baseline
rhythm from the song "Under Pressure." For the record, he did eventually settle out of court with Queen and David Bowie.
5. The "Memento." This is really a whole class of errors of
omission. This occurs because transacting in data is relatively new. When deals are negotiated many different revenue models are often used. causing dealmakers to be in the dark on what exactly their
systems can produce. At the end of the day neither the buyer or seller remembers what they are actually supposed to pay. Sins of omission are named after the highly acclaimed movie "Memento" about
short-term memory loss.
So why should we care about this in the first place? Data skimming helps low=quality data sources beat out high quality data sources and that is bad for the industry.
For years, economists have talked about how various forms of cheating can actually devalue a market. In "Reinventing the Bazaar," a book by John McMillan, the Stanford economist explores how
high-quality milk in India used to be hard to find; to boost their profits wholesalers and vendors would water the milk down. Despite using their sense of smell to determine the milk's freshness,
buyers couldn't judge its butterfat content -- a key ingredient in assessing the quality of the milk. As a result of this practice, sales of milk declined significantly, bringing per-capita
consumption down more than 25%. Eventually, India fixed the problem by introducing low-cost machines to measure butterfat content that improved the overall quality of the milk while increasing
consumption. In summary -- fair practices help build a bigger market for everyone.
So what can we do as an industry to prevent data skimming from spreading? First and foremost, marketers
need to take a more assertive role by establishing rules of fair play. A data exchange can help in this effort by empowering data sellers with the ability to collect, classify, qualify, sell,
integrate and bill data usage in a standard way with a standard set of buyer Terms and Conditions. What's more, a data exchange can leverage marketplace design to set pricing for data, giving both
buyers and sellers an honest and fair transactional valuation.