Commentary

Data With Benefits

by Omar Tawakol , September 9, 2011

Data has real monetary value. Many publishers are beginning to realize that their audience asset can be monetized beyond the few minutes a consumer spends on their site. However, wherever there is real monetary value, various forms of skimming and cheating may follow. As a result, many data users are trying very hard to get access to this data without having to pay for it In other words, it should be called "data with benefits." As an industry, it's vitally important that we work hard to find these types of abuse and eliminate them.

To help the industry stay vigilant, here are five of the more popular skimming scenarios to look out for:

1. The "George Costanza": History sniffing involves using scripting to look for the color of a link displayed in the browser. The color tells you whether the user visited a URL in the past or not. A couple of companies have used this as a way of judging data quality (which is in and of itself not a bad intention). However, the practice has major issues because it circumvents a consumer's notice mechanism where the individual website discloses to the consumer its data sharing practices -- yet the data sharing occurs unbeknownst to the user via another site. This practice is also annoying to the publisher that generates the data because it circumvents that publisher's ability to get paid. Avoid this practice at all costs. History sniffing shall hence be known as the George "should I have not done that?" Costanza tactic.

2. The "George Michael": Usage skimming occurs when a network or DSP agrees to pay a data provider for every impression served targeting the data. However, after they target the data once, they then consider the data "theirs" because they claim to capture data from their own ad serving. Hence they never pay the data provider for more than a single impression. Clearly this wasn't the intent of the data provider trying to monetize their site data. Usage skimming will be named in reference to lyrics from a George Michael song ("Praying for Time"): "What's mine is mine, not yours."

3. The "Woody Allen": Fake revenue share occurs when a buyer ends up paying the least amount possible for a data attribute that comes from multiple sources. Let's say a data buyer acquires a range of third-party data that varies in quality from difference sources. One provides geographic info that comes from self-declaration: "I live in NYC." The other source provides geographic data based on IP address (NYC). Self-declared data in this case is valued higher because of the accuracy of the data. A form of fake revenue share is when the data buyer only pays the cheaper of the two sources for this data. We name this one after Woody Allen's character in "Anything Else": "Never trust a guy who fumbles for the check."

4. The "Vanilla Ice," otherwise known as Look-a-Likes without credit: This occurs when a data buyer is building a look-alike model and they use your data as the seed for their look-a-like model. However, they give you no economic reward for that seed. They can justify this by claiming that they targeted cookies outside your cookie pool and only used your data to build a model. No credit look-alikes are named in honor of Vanilla Ice, whose #1 hit "Ice Ice Baby" included an altered baseline rhythm from the song "Under Pressure." For the record, he did eventually settle out of court with Queen and David Bowie.

5. The "Memento." This is really a whole class of errors of omission. This occurs because transacting in data is relatively new. When deals are negotiated many different revenue models are often used. causing dealmakers to be in the dark on what exactly their systems can produce. At the end of the day neither the buyer or seller remembers what they are actually supposed to pay. Sins of omission are named after the highly acclaimed movie "Memento" about short-term memory loss.

So why should we care about this in the first place? Data skimming helps low=quality data sources beat out high quality data sources and that is bad for the industry. For years, economists have talked about how various forms of cheating can actually devalue a market. In "Reinventing the Bazaar," a book by John McMillan, the Stanford economist explores how high-quality milk in India used to be hard to find; to boost their profits wholesalers and vendors would water the milk down. Despite using their sense of smell to determine the milk's freshness, buyers couldn't judge its butterfat content -- a key ingredient in assessing the quality of the milk. As a result of this practice, sales of milk declined significantly, bringing per-capita consumption down more than 25%. Eventually, India fixed the problem by introducing low-cost machines to measure butterfat content that improved the overall quality of the milk while increasing consumption. In summary -- fair practices help build a bigger market for everyone.

So what can we do as an industry to prevent data skimming from spreading? First and foremost, marketers need to take a more assertive role by establishing rules of fair play. A data exchange can help in this effort by empowering data sellers with the ability to collect, classify, qualify, sell, integrate and bill data usage in a standard way with a standard set of buyer Terms and Conditions. What's more, a data exchange can leverage marketplace design to set pricing for data, giving both buyers and sellers an honest and fair transactional valuation.

data management, metrics

2 comments about "Data With Benefits".

Check to receive email when comments are posted.

Anil Batra from Optizent, September 9, 2011 at 11:33 p.m.
Great post Omar. I agree with all the skimming scenarios however I don't agree with number 2. The buyer is collecting data via ad serving and so it is technically their data. Here is an analogy, I buy seeds from a local grocery store to grow some vegetables and then use those vegetables to make more seeds. Whose seeds are those? Mine, correct?
Reply

Omar Tawakol from Blue Kai, September 11, 2011 at 1:50 a.m.

I understand your objection - it is a valid objection.

However, the issue is that the contract alluded to above (in use case 2) with the data provider is for paying on every impression that uses the data for that time period. Since the buyer knows that they intend to violate the contract and not pay for every impression that targets that data - they should enter into a different type of contract.

For example, BlueKai's marketplace supports a different data buy where you pay for the right to access the cookie (cost per stamp). In that contract - you pay for the data on that cookie and can serve 1 impression or 1000 impressions on that cookie attribute - it is up to you. A buyer who wants that type of contract can do that with a single click.

The real issue is that the very people who play the single impression game are the same ones who wont do the cps model because they say: "we don't want to pay for the cookie because we are not sure how many impressions we will serve. don't make us pay for getting access to the data - but let us pay on usage instead and in return we promise to pay for any and all usage of the data" That is basically breaking the trust.

Hope that clarifies the issue.