Commentary

Talk To The Machine

I have a long and mixed history with voice-recognition interfaces. I confess that over the years of testing them as a software reviewer, trying to cozy up to Siri, and then being forced to rely on voice-to-text functionality during some post-op recovery, I remain unconvinced of their widespread efficacy. It is a cumbersome and socially awkward way to interact with a machine in the majority of public and office-bound circumstances. And I am not entirely sure why many people in the industry feel compelled to push us in that direction, despite all the false starts. Is it that Captain Kirk scenario of barking orders to a dulcet-toned computerized secretary?

On the other hand, I acknowledge that the limitations of the smartphone keypad make voice commands more attractive. I have learned to use Siri mostly as a dictation machine for text messages. In search I use it only occasionally. The problem is that even though I am consigning voice commands to certain use cases and situations, it is still an option I have to force forward into my consciousness in order to use.

Interestingly, Wells Fargo is about to incorporate voice commands in its banking app because the company feels the interface actually works best in more sophisticated operations. A report in the Charlotte Observer quotes the company's head of mobile technology saying that he too was not convinced that voice recognition was a useful feature in their apps. But he realized the interface was best focused away from simple navigation -- which is indeed awkward -- and toward the more complicated requests.

“That was the ‘oh, wow’ moment for me,” he said. “It's not about navigation. It’s not about replacing clicks. The click or the tap is always going to be more efficient than using voice. This is about bringing ... powerful interaction.” By that he means voice control allows the user to string together concepts into questions like how much the user spent in different categories during a given month. In a standard computer interface that would require applying manually many different filters.

Voice is the shortcut that makes most sense when the path is longest. The article reports that banking in particular is interested in leveraging voice as a way to accommodate and perhaps triage customer needs that might ordinarily clog the expensive call center channel. One can imagine voice recognition being able to tease out from a user statement enough keywords to make sense of a request and push to them the right FAQ, for instance. The system wouldn't need to understand the user literally and fully in order to provide efficient service.

We are already starting to see some of this functionality in action. Geico's app has a voice interface for certain types of requests. They personalize it as “Lily.” USAA has added voice to its app in much the way Wells Fargo is expecting to. It will answer questions like "how much did I spend last month” and “what is USAA's routing number.” The app is promoted to customers as a way to “access your USAA accounts…faster than you can dial our toll-free number.”

One of the interesting parts of the interface is that it may help a customer drill deeper into a brand's resources. USAA says that in its trials it found that users could discover new features they wouldn't have otherwise known about. Voice seems to encourage a wider flexibility of requests than typing.

USAA seems to understand that they need to make the case with users that voice is in fact a better interface for some operations. The site says, “With a virtual assistant, you just talk like you normally would and give commands for what you want to do. Let's say you want to pay a bill. This involves information such as which bill to pay, when to pay it, how much to pay and from which account to draw the funds. Using a mobile device, you would have to enter these information fields, one by one, on a touchscreen. But with a virtual mobile assistant you can combine the transactions into one statement: ‘Pay $125 from my checking account to my MasterCard bill next Monday.’ That speeds up the interaction significantly. If required information is missing, you'll be asked the appropriate question to fill in the blank.”

Whether and how much voice actually integrates into everyday mobile use is still unclear to me. It is the unevenness of its utility that works against adoption. The user needs to remember the option and decide when it is more efficient than typing. I am less convinced that voice interfaces somehow engage us more and create relationships with brands because of the apparent humanity of the interaction.

Much like the humanoid robot trope that occupied science fiction for much of the last century, the true “voice assistant” strikes me as more of a cultural fantasy. It helps us humanize the machines that trouble us in their power, intimacy and obvious lack of humanity. At the same time it give marketers another delusion that somehow consumers really like and interact with their brands in ways that approach real human relationships.

I am sure voice interaction will have some segregated place in our exchanges with computers. I am more curious than ever, however, about why we seem so eager to talk to our machines in the first place.  

1 comment about "Talk To The Machine".
Check to receive email when comments are posted.
  1. William Meisel from TMA Associates, January 21, 2014 at 9:21 p.m.

    Speech recognition alone (converting speech to text) requires a dictation skill few have developed (doctors being a major exception). The key to speech as a user interface that many will use is two supporting technologies: natural language interpretation and knowledge representation. Natural language interpretation is the technology that interprets what the transcribed text MEANS, what you are trying to accomplish or what information you are seeking. Knowledge representation takes data and represents it in a way that it can give ANSWERS, as opposed for example, a list of web sites. It's the combination of these that Wells Fargo understands will have major impact on our interactions with computers.
    Speech-to-text is fairly mature and highly accurate in low-noise environments. The other technologies are just maturing. Stay tuned.

Next story loading loading..