Email Addresses Are Collected Before Consumers Submit Web Forms, Study Says

A new privacy scandal is emerging in the email field -- and there is no telling where it might end up.

Email addresses are being collected from login, registration and newsletter subscriptions and sent to trackers before consumers submit any form or give their consent, according to Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission.  

Of 100,000 websites evaluated, email addresses are being exfiltrated on 1,844 websites in the EU and 2,950 sites in the U.S., the study says.  

Most of the email addresses are sent to known tracking domains but the study identifies 41 that are not listed on any of the popular blocklists. 

In addition, the authors found incidental password collection on 52 websites by third-party session replay scripts. 

advertisement

advertisement

The problem is that consumers often abandon forms without intending to go any further.  

A survey by The Manifest found that 81% of 502 respondents abandoned forms at least once, and 59% had done so in the past month, the study notes.  

The alleged lack of consent could cause regulatory issues under the GDPR and could lead to clamor for further regulation worldwide. 

Moreover, the authors claim they found that Meta and TikTok collect hashed information when the user does submit the form or give consent.

The study was conducted by Asuman Senol (imex-COSIC, KU Leuven); Gunes Acar (Radboud University); Mathias Humbert (University of Lausanne and Frederik Zuiderveen Borgesius (Radboud University).

The authors measured email and password collection that occur before form submission on what they say are the leading 100,000 websites. 

According to the authors, the top websites where the filled email address was collected by a tracker before form submission were:  

  • usatoday.com*--Third party: taboola.com, Hash (SHA-256)
  •  trelio.com*--Third party: bizible.com, Encoded (URL) 
  • independent.co.uk*—Third party: Taboola.com, Hash (SHA-256) 
  • shopify.com—Third party: bizible.com Encoded (URL) 
  • married.com—Third party: glassboxdigital.io, Encoded (URL)
  • newsweek.com*—Third party: glassboxdigital.io, Hash (MD5, SHA-1, SHA-256) 
  • prezi.com*--Third party: taboola.com, Hash (SHA-256)
  • branch.io*—Third party: bizable.com, Encoded (URL)
  • proothomalo.com—Third party: facebook.com, Hash (SHA—256)
  • codeacademy—Third party: fullstory.com, Unencoded
  • azcentral.com*—taboola.com, Hash (SHA-256) 

*Not visible after February 2022

In addition, the authors say the domains that have received such email addresses include:

  • Taboola (taboola.com)
  • FullStory (fullstory.com) 
  • Awin Inc. (zenaps.com*, awin2.com*)
  • Yandex (Yandex.com)
  • AdRoll (adroll.com)
  • Glassbox (glassboxdigital.io)
  • Listrak (listrakbi.com)
  • Oracle (bronto.com)
  • TowerData (ricdn.com)
  • Salescycle (salecycle.com)
  • Automatic (gravator.com)*
  • Facebook (facebook.com)
  • Salesforce (pardot.com*)
  • Oktopost (okt.to)*

*Third-party domain is not among the request initiators. That means the leak could have been triggered by another party. 

MediaPost reached out to several of these firms for comment, but had not received responses at deadline. 

The leading categories where gathering of emails is happening in the U.S. are:

  • FashionBeauty (224 leaky sites)
  • Online shopping (567)
  • General news (392)
  • Software/Hardware (162)
  • Business (484)
  • Marketing/merchandising (192)
  • Internet Services (199)
  • Travel (82)
  • Health (69)
  • Finance/Banking (49)
  • Sports (56)

The study concludes: “Considering its scale, intrusiveness and unintended side-effects, the privacy problem we investigate deserves more attention from browser vendors, privacy tool developers, and data protection agencies. 

For the record, the authors describe their methodology as follows: “Our main dataset consists of eight crawls, all of which were run in May and June of 2021. A total of six desktop crawls were run from the EU and the US using three consent modes: no-actionaccept-allreject-all.”

In addition, “two mobile crawls were run using the no-action mode from the two locations,” the authors state. “In the four, no-action crawls (100K websites), we flag the websites where we detected (but not interacted) the presence of a CMP using Consent-O-Matic. We then use these CMP-detected websites in the accept-all and reject-all crawls.

The authors continue, “For comparability we use the same 7, 720 CMP-detected web sites in the accept-all and reject-all crawls on both locations— the 7, 720 websites were detected in the EU crawl. While we limit our crawls to the top 100K websites, our dataset contains approximately 2.8M page visits across all crawls considering the inner pages visited when searching for email and pass- word fields."

Readers may obtain more information on the study here. 

Correction: The authors of the study report the following:

13 May 2022: The initial version of our website and paper incorrectly referred TowerData as the owner of the rlcdn.com domain. The rlcdn.com domain belongs to LiveRamp. We've also reported this issue to Disconnect, which was one of the sources we used to identify domain ownership.

 

 

1 comment about "Email Addresses Are Collected Before Consumers Submit Web Forms, Study Says".
Check to receive email when comments are posted.
  1. T Bo from Wordpress, May 13, 2022 at 8:58 a.m.

    Sounds creepy.

Next story loading loading..