Recent real-world experience shows that this is sometimes not the case. Even worse, some accepted methods for identifying and discarding fraudulent respondents don't do nearly enough to ensure an accurate, clean and representative sampling on which companies can base decisions with confidence.
The Internet has made it easier than ever before for brand-name companies to gauge reactions to their new ideas quickly and inexpensively by tapping into the opinions of what they presume to be their target markets. This is reflected by the fact that the online research industry has grown 30% in the last year, and, in 2007, respondents completed one billion online surveys. Last year, companies spent more than $4 billion doing online research. Yet, the data returned by these surveys - data that might be used by a consumer products business to determine its next strategic move -- could have serious flaws.
Here are just a couple of real-world examples. These are actual survey-takers who were identified and prevented from continuing to respond to research requests:
• A man from the Midwest had 130 user accounts. He posed as a woman or a man, always from different email addresses. Oddly, one of his consistent identifying characteristics was that he loves shrimp, but other than that he was using 130 different identities to earn money by participating in market research.
• A 21-year-old woman from Texas used more than 150 accounts to provide information to market researchers. She always listed her income in the top 1% of all earners. In fact, she had very low income.
Based on real-world validation efforts using a combination of techniques, we have been able to quantify the likely level of fraudulent or questionable survey takers. It's a scary number: 24%. That number breaks down in this way:
Thirteen percent were filtered out because they overlapped with online respondents found on a Certified Partner panel. By using a unique software approach which operates cross-supplier, we de-duplicated panels against each other. This identified respondents who were attempting to take a survey more than once.
An additional 4% were caught by the more commonly used digital fingerprinting survey-level filtering, which identified duplicate or straight-line responders, or those misrepresenting their location.
This means that a quarter of the panelist/respondents in a typical online survey may be fraudulent or unreliable. Most leading companies would say that if two to three percent of survey data is questionable that's too high. Twenty-four percent is a data disaster, and it can have serious business consequences.
Today the most widely accepted method to prevent this type of survey fraud is "digital fingerprinting." This method attempts to prevent the same computer from taking a survey more than once. Many companies assume that if their survey panels are vetted with this method then the data is clean, accurate and reliable. It isn't.
As we found, this method identified less than 20% of the dubious respondents and does not ask the fundamental question "is the person taking the survey who they say they are?" Digital fingerprinting is a useful first step in the process of ensuring honest survey panels, but companies that rely solely on this method are taking a significant risk if they base business decisions on suspect survey samples.
Market researchers need a more reliable data quality method that focuses on risks and behaviors at both the panel and survey level. Panel-level filtering ensures that panelists are real people by cross-checking potential online panel participants in real time against an aggregate consumer database that covers 96% of individuals in the United States.
This enables independent verification of a panelist's name, address and other relevant information, such as age and income. In essence, it verifies the existence of the potential respondent in the real world. At the same time, researchers can determine whether multiple panelist IDs share the same name and address. If researchers are willing to accept the data generated from these respondents to make multimillion-dollar decisions, it seems reasonable to require that the respondent share some basic Personally Identifiable Information and that this information is validated.
Panel-level filtering removes those panelists who introduce risk into sample quality. This can include respondents whose identity cannot be verified; who have multiple panel IDs but only one real-world ID (indicative of a professional survey-taker); and who have exhibited signs of disengagement across multiple surveys.
Rightfully, consumer goods and services companies expect their market research partners to ensure that the research delivers quality results. To satisfy these companies, research partners must take this problem much more seriously and go beyond the currently accepted methods of vetting survey participants. If we don't do this, the value of online survey data will be, and should be, viewed as less than useful ... or even as dangerous.