Voice agents are on the verge of taking on a new role.
The demonstration of Google Duplex at the Google developer conference this week provided a glimpse of what’s around at least one corner.
The new capability lets Google Assistant make a phone call and conduct a natural conversation to carry out routine tasks, such as scheduling a hair appointment or making a restaurant reservation.
In the demonstration at Google I/O 2018, the search giant showed how the voice agent was built to sound natural; so natural, that the recordings of the agent calling to make an appointment at a hair salon and then to a restaurant for a reservation sounded like two actual people conversing with each other in those situations.
At the core of Duplex is a recurrent neural network designed to understand, interact and speak. The network uses Google’s automatic speech recognition technology, which anyone who uses the Assistant on an Android phone knows is very good at correctly interpreting what someone says.
Google makes the voice sound more natural by adding “hmm” and “ah” and sometimes even “yeah” rather than “yes” in the course of a conversation.
This development is a case of modifying digital elements to deal in an analog environment. For a computer to “speak” with a human, the technology has to somewhat mirror the characteristics of everyday speaking, flaws and all.
This approach could potentially benefit a consumer, by automating the scheduling of tasks that require a personal phone call.
On the other side of the equation, businesses are looking to automate their customer interactions, such as bots.
For example, Domino’s recently added artificial intelligence to pizza ordering by way of a voice-recognition application that can take telephone orders, as I wrote about here (Domino’s Adds Artificial Intelligence To Phoned-In Orders). In that case, a human originates a call and a computer receives and it and manages the request.
With Google Duplex, it’s the reverse, with a computer originating a call, and a human on the other end fulfilling the request.
Eventually, the technology will communicate with itself digitally on both ends, no voice needed.
Until then, the Google’s will work on one side of the equation and the Domino’s will work on the other side.
Meanwhile, people will still make phone calls and speak to actual people on the other end.