High-Risk Data Collection Seen Across Web Sites

A cross-industry study of Web data collection activity reveals the “significant” extent that data is mined or used by companies other than site owners, according to Krux.

The third annual study also measures the “data leakage” associated with social media/sharing widgets. This year, in addition to the top 50 ad-supported content sites, Krux expanded its analysis to also include data from the top 100 e-commerce and marketer sites, as well as 50 smaller content sites.

"Though the increase of third-party data collection has moderated due to better data governance by website operators, there's still a great deal of unknown and unwanted data harvesting happening out there," said Tom Chavez, Krux co-founder and CEO, in a release. "Given Krux's role in the data ecosystem, we understand that not all collection is bad. However, when 46% of this collection comes from higher risk intermediaries and market middlemen, it raises questions as to which companies are mining the data, if that collection is fully sanctioned and how that data will ultimately be used."

The study also showed that data collection activity that is within the website owners' control dropped by 40%.  This illustrates a concerted effort on the part of many Web operators to implement better data governance practices and exert greater control over what is happening on their pages.



The number of third parties observed participating in data collection continued its rise, from 168 individual companies in 2011, to 300 in 2012 and 328 in 2013. And collection from higher-risk categories rose as well, from 40% in 2012 to 46% in 2013. Collectors are considered “higher-risk” if there is a potential that they will use the data they collect to power competitive market activity.

Krux notes a number of interesting trends. Data collection volume from social media/sharing widgets, including mainstream social media principals like Twitter, Facebook and Google+, as well as intermediaries such as AddThis, grew almost 30% from 2012 to 2013.  Data collection from social media/sharing widgets now represents 20 percent of all third-party data collection. This growth reflects both more aggressive collection activities by the social players as well as increased reliance on social/sharing tools by Web sites to grow their consumer reach.

Data collection from advertising Supply Side Platforms (SSPs) dropped 70% from 2012 to 2013. This reflects a significant drop in volumes from AdMeld since its acquisition and integration into Google. It may also reflect increased frequency capping, server-to-server cookie syncing and other techniques that are harder to observe at the page level.

Like their content counterparts, e-commerce and marketer sites experience a similarly high proportion of third-party collection activity that is beyond their control -- 60% and 54%, respectively.

Next story loading loading..