Commentary

Six Techniques to Improve and Safeguard Online Data Collection

by Keith Perlstein , December 30, 2010

Consumers continue to be concerned about how their personal information is being used online. For relationship marketers, establishing and maintaining trust is paramount. While it is critical to scale successful relationship programs, you may find it challenging to adapt your site standards and practices on third-party sites and platforms.

Although it may not be possible to replicate the exact pages, forms and controls from your own pages on third-party sites, there are many methods to detect gaps and problems. Here are six methods that have worked:

1. Feedback Mechanism: When consumers register on a third-party site or landing page, it is not always as easy to contact you, report problems or register a complaint. Involving consumer relations in your marketing initiatives and providing an easy way to report problems for consumers is a good way to ensure that gaps in the process are tracked, uncovered and fixed as soon as possible -- should they exist.

Example: Link, form or email address on form, thank you page, welcome message.

2. Point-to-Point Data Testing and Comparison: When data is being moved from place to place, it is crucial to ensure that the data is being mapped and delivered properly. If there is a problem with a name or email address getting from the form to the system, or the third-party partner's system to your database, it is possible that a consumer cannot be contacted or their profile can be corrupted. The following tests can be done before or after a campaign has been launched:

• Prelaunch: Doing a test of the full stack (the entire process) is a way to eliminate most of these problems. To conduct this type of testing prior to launching a campaign, each data provider and consumer should utilize beta testing. For example, the provider builds a beta form and loads test data into its system that is packaged and delivered in the precise way it will be in a live environment -- except in this case it is delivered to a beta version of the endpoint (example: web service or SFTP folder). The endpoint should be configured to QA (quality assurance) using the same business rules as the live campaign and should deliver the validated data to a beta copy of your program. This practice ensures that problems are caught and fixed prior to launch, and provides a proven framework that is used to build out the live version of the campaign.

• Postlaunch: Once a campaign has been launched, it is important to check that all of the attributes of your campaign were configured correctly. Many networks and data collection platforms are being distributed into many different environments, so it is wise to ensure accuracy once a campaign is up and running. This test can be done in many ways -- manually and electronically. One easy method is to repeat the full stack test, this time using live iterations of pages and forms used in the campaign. To do this, you will need to audit what was entered in the page, what is stored in each system and ultimately what is written in the consumer's profile and your brand or organization's data warehouse. If the data do not match, you have uncovered a problem (controlling for any planned translation at any point). The best systems for storing and transferring data are already doing some verification and auditing. The most advanced software modules often allow verification and storing responses from an endpoint. Using this function makes it possible to automate some testing and comparison. For example, you can return a member id in your response, which can later be used to reconcile what is sent to your system versus the profile created.

3. Data Pattern Inspection and Detection: To guard against errors and the possibility of fraud, it is important to take a look at data once there is a statistically relevant amount (typically after a few hundred or thousand records collected). The goal of this test is to find patterns and problems. This test is performed by pulling a random sample within a given time period or between campaigns. Here are some examples of patterns or problems:

• Hardcoded values: The same or similar values occurring frequently

• Fraudulent transactions: Suspicious entries such as street addresses that match one another, or vary by only one byte

• Vulgar or fake PII: Input where the consumer has apparently entered a fake name like that of a fictional character or something obscene.

• Foreign or invalid characters: A certain character that should not be accepted is appearing in records or something that should be accepted is causing a problem with an endpoint or application

4. Advanced Form Level and Stress Testing: When gaps occur and cannot be tracked using some of the standard methods outlined above, it may be necessary to ensure that a significant volume of testing take place -- exactly as the campaign was implemented in the live environment. To do that, you should build a script of program, or hire a team to enter and audit 1,000+ unique test records on your media partner's registration forms using the most common browsers and versions that consumers are using. If all test data entered on the media source site matches data delivered to your customer database, then you passed this test. If not, there may be a problem with capacity, caching or handling sessions on a media partner's site or a data consuming endpoint.

5. Consumer IP Audit: These tests are intended to analyze consumers' IP addresses to ensure accuracy and guard against potential fraud:

• Confirm accuracy: There are many ways that IP addresses can be correlated to a consumer. Although it is not uncommon for ISPs to originate from a location that does NOT match consumers, there should be some correlation in the majority of cases. For example, it would be unusual for a campaign on a U.S. site collecting U.S.-based consumers to have 51% of IPs matching ISPs in Asia.

• Detect fraud: Although it is possible to have matching or the exact same IP more than once (particularly when a campaign is capturing a high volume), this case should be the exception and not the rule. If there is a high occurrence of the entire address matching (higher than 5%-10%), then there might be a problem with the way it is captured or fraud (a program or single person entering several records).

• Confirm identity: In the case of dispute or complaint, the IP address of the network or machine can be compared against a consumer's known or stated outgoing IP address to detect if an unauthorized person has entered a consumer's data.

6. Cross-campaign Data Comparison: If you are running multiple campaigns on a site or with a media partner, then you have another source for comparison to detect unauthorized methods of signing up (this would be a consumer being enrolled in a deceptive way or automatically -- without consent). Although it may be common in many environments for a consumer to enroll in more than one program on the same sites in the same time period, this should not be the result in the majority of cases. To test for this, campaign data for the same time periods should be pulled for several campaigns (that are being run with the same media partner) and compared against one another to see if the same consumer appears in a significant amount of cases.

By conducting these tests and collecting the necessary data, you can often prevent or lower the impact of problematic data.

commentary, privacy

Next story loading

About the Author

KEITH PERLSTEIN,