Electronics have increased the demand for applications that use voice to recognize the person speaking. Google Home, for example, can be used to identify each of the residents in the house using the device. As more consumers begin to make purchases with their voice, the ability to identify the person through inflections and behavioral tones in their speech patterns could help to improve ad targeting -- if skillful impersonators cannot fool the technology.
Smartphones, televisions, home speaker hubs, and a variety of electronics are equipped with applications that function with voice commands. And it's becoming more common for a person to conduct an online search, dictate messages, or translate phrases by voice.
Voice is the new user interface and source of revenue. People are doing more than turning on and off lights or ask for directions. They are buying things, too. Some 73% of consumers with voice assistants have made purchases directly through their device using their voice, according to data Invoca will release Thursday.
A dissertation released November 14 from the University of Eastern Finland brings to light a new type of threat. While it doesn't explore ad fraud, malware and fraudulent purchases, it analyzes how skillful voice impersonators can fool state-of-the-art speaker recognition systems, as these systems generally are not yet effective in recognizing voice modifications and inflections in the human voice, writes Rosa Gonzalez Hautamaki.
The dissertation, titled Human-Induced Voice Modification and Speaker Recognition, analyzes how to incorporate fundamental frequency (F0) information into the technology. The technology marks "syllable structure and word boundaries" to identify the person speaking.
The study analyzed speech from two professional impersonators who mimicked eight Finnish public figures. It also included speech from 60 Finnish speakers who participated in two recording sessions. The speakers were asked to modify their voices to fake their age, attempting to sound like an elderly person and like a child. The study found that impersonators were able to fool automatic systems and listeners in mimicking some speakers. In the case of acted speech, a successful strategy for voice modification was to sound like a child, as both automatic systems’ and listeners’ performance degraded with this type of disguise.
In the paper, Hautamaki suggests that voice attacks against speaker recognition can be done using technology such as voice conversion, speech synthesis and replay attacks. Voice modifications produced by a human disguising their voice or impersonations cannot be easily detected. Hautamaki notes that the scientific community has been working to develop techniques against technically generated attacks though voice systems, but whether the systems are successful depends on the technology.
This column was previously published in the Search Insider on November 14, 2017.