The Voice, Unbundled

Forget about Millennials. Anyone trying to understand the future of media and technology should observe what the children do.

First of all, they talk to computers. This has never really happened before. From mainframes to PCs to smartphones, humans have used the keyboard, mouse and our fingers as inputs, while screens (and printers) served as the outputs. In other words, we’ve been typing and reading. But with Siri, Echo and (soon) Home becoming commonplace, a whole generation of kids is growing up with different paradigms and thus different relationships to computers.

Full disclosure: I have two kids. My daughter is eight and my son is seven and both would be upset if Siri died. But they would get over it. However, if Alexa died — Alexa who tells them jokes, plays them music, reads them the weather, and misunderstands them hilariously like a foreign au pair  — well, that would be traumatizing. It’s not that they don’t realize Alexa isn’t a person, it’s that they don’t care. She’s become a kind of pal. And like all pals, she can be the source of both happiness and frustration. Take these exchanges, for example:



Lukas: “Alexa, play the song ‘The Shoemaker’s Dance’”
Alexa: “I can’t find the song the shoemake is dance like no he’s alexa turn off yourself”

Zoe: “Alexa, play ‘Scars To Your Beautiful’ on Spotify”
Alexa: “I can’t find Scars To Your Beautiful’ by Daughter Butterfly”

Lukas: “Alexa, play Anouar Brahem on Spotify”
Alexa: “I can’t find song by Esco Feat. Chino, Bramma, Chan Dizzy, Rholin X, Bounty Killa and Ward 21 on Spotify”

(And yes, my son does like Tunisian oud music).

The fact that voice is becoming an important user interface is not surprising. As computers have evolved, they’ve gotten closer to us. Originally they were housed in distant air-conditioned basements, then moved to our desks, our laps, our pockets, and our wrists. Screens get smaller, then go away entirely, taking with them an entire layer of friction. You no longer need to hunt for your iPhone, unlock the screen, and open the Amazon app to buy something. You just tell Alexa to do it.

This kind of ambient computing relies on two big developments in technology. The first is the availability of cheap, high-quality sensors enabled by the smartphone supply chain, and their eventual unbundling for use in other hardware. The second is a subtopic of AI known as natural language understanding (NLU), which is the process of disassembling and parsing human speech.

This is actually an incredibly hard problem, since there are thousands of ways to request something in a human language that still defy NLU.  So having a productive exchange with Alexa is only possible because the scope of Alexa as an application is very narrow. This is in contrast to Siri, which people originally ridiculed – due, in part, to Apple’s claim that Siri could answer anything (which it clearly couldn’t).

But these limitations don’t seem to matter to kids, who are both new to the power of language and blissfully unconcerned with the computing paradigms that preceded them.

For humans of all ages, the spoken word has always held a special status. Our voices convey not only information and ideas, but also mood, emotion, personality and origin. The voice is the very emblem of the speaker. This is why, in movies, the only non-human characters that seem truly, eerily, human are the ones that possess real voices: Roy Batty in “Bladerunner,” Ava in “Ex Machina,” the Cylons in “Battlestar Galactica.”

2,300 years ago, in his work “Phaedrus,” Plato strongly asserted the primacy of speaking over writing, stating that the invention of writing “will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them.”

One can only wonder what Plato would have made of emoji.

Next story loading loading..