Second Look: Stanford Research On Digital Information

As many of you know, some research came out from the academic community earlier this week. This research from Arvind Narayanan and Jonathan Mayer of the Center for Internet and Society at Stanford, was released as part of a press event for “Yes, They Really Know It’s You: The Digital Collection of Personal Information from Citizens” in Washington, D.C.

This is the same event where FTC Chairman Jon Leibowitz provided the keynote address, in which he described online marketing companies as a type of “cyberazzi” (comparing online advertiser to paparazzi) that are using their “lenses” to view consumers.

Snazzy sound bytes aside, the event continued a pattern of advocates conflating issues of cyber stalking, identity theft, utilization of online data for credit reporting decisions with the subject of online behavioral advertising.

Jonathan and his colleagues are extremely smart. And at least in my opinion, well intentioned in that they are trying to keep the industry in check. However, I’m concerned as much by the impact of the research as I am about its content.

For example, the press and blogosphere --- rather than checking the accuracy of the claims, or understanding the broader context in which they are made, chooses to publish headlines such as “Consumer Watchdog Hammers Online Industry, Cites Study Claiming Websites Often Share Personal Info.”

Of course, when the title of the research claims that “every (online company) knows your User Name” yet the footnotes admit that the data leaks “may have affected only a minority of users,” 'it's pretty clear what types of press headlines this research was trying to create.

Some of this is the fault of our industry –- as many of us are loath to discuss these subjects for fear of being named in a story where issues are further taken out of context. However, it is time that we set the record straight by clarifying the following.

Personally Identifiable Information

Responsible companies in the OBA space don’t use PII or sensitive non-PII segments for ad targeting. As noted recently by Jules Polenetsky from the Future of Privacy Forum, “most companies targeting ads online have no use for personal information. If they are getting that kind of information, it is most likely because of an inadvertent mistake.” The Stanford study, as Jules rightly noted, fails to provide this context.

Transparency Is Not The Same As Use

Any time an advertisement is served, a piece of content viewed, a social network visited, or an online video is viewed, lots of information passes through multiple entities as part of these processes. This includes Internet browsers, content delivery networks, operating systems, analytics companies, ad-serving companies, ISPs and others. And it's difficult for anyone -– even a super-smart tech researcher from a top-tier university -– to be able to know what information is being utilized, by whom, and for what purposes.

But if you’re going to note that an ad network that may inadvertently have transparency into a data stream that includes PII and/or a unique identifier, then why not also mention any of the other entities? For example, if a Web site encodes a user name into a URL but fails to tell its partners then the operating system, browser, analytics company, ISP, and everyone else will touch that user data -– but not know it.

The reason they don’t know it is that they usually require sites to not pass PII and user names and they have no automated way of looking for un-intended leaks. Transparency into a data stream is not the same thing as HAVING data or USING data -– and any implication to the contrary is uninformed and/or disingenuous. One of the reasons our industry standards dictate use of information is that if you start placing restrictions on transparency into the data stream, you’re going to trigger a number of unintended consequences.

Pseudonymous and Anonymous Are Important Distinctions

The research study draws a distinction between anonymous and pseudonymous data -– but for the wrong reasons. I agree that there is an important distinction to draw between the two data types -– but it’s around what is possible versus what is practical. It may be technically possible for a really smart tech researcher to create an algorithm that can identify a unique person 70% of the time based upon the User ID that they typically use.

However, the implication that a quorum of online advertising networks are spending their time trying to re-identify Users in this way is preposterous. That said, it might be helpful as a consumer education tool to remind consumers that using the same User ID for all of their online interactions is just as bad an idea as having all their passwords set to the name of their favorite sports team.

Public Policy Should Be Made On What Is Practical, Not What Is Possible

If the key takeaway for Congress and regulators from this research is that the general practice for online media companies is to use personally identifiable information for ad targeting -– that would not only be unfortunate -– it would be dead wrong.

Web Site Publishers (And Others) Should Monitor The Data They Make Available

That said, Web site publishers are well served to be cognizant of the information that gets exposed to third parties, even if inadvertently. Sites should consider encoding user names so that they are not in a URL in the clear. Most partners in the ad ecosystem don’t actually want to touch this data at all. Many Web sites are cognizant of this already, which is also something that is alluded to in the footnote of the research.

Next story loading loading..