Who knew that Siri had a real salty mouth on her? I've been consigned this week to using dictation software to write my columns and posts. And I've
discovered quickly that Siri doesn't mind inserting obscenities in messages when you're not paying close enough attention. I caught an email that turned some three-word phrasing into
‘f**cking’ until I caught it in a quick proof. Jeez, Siri, watch your f**cking mouth.
I have to admit I've never been a big believer in voice command
interfaces. It seems to me that using things like Dragon software as a real interface would simply consign all of us to talking to our PCs and smartphones and a lot of very noisy offices. A voice
command world would make every office environment sound like a telemarketer sales boiler room. Years ago, when I made my bones as the software and hardware reviewer for a number of different
magazines, I used a lot of the different voice recognition packages and found them more disorienting and disturbing than helpful. But now I have no choice. The only way I can write, at least for the
next few days, is to try all range of different voice to text programs.
For this column, I'm trying to use the Siri interface on my iPad to see how that works. I was pleased with
how far Dragon had gotten with its interface. When I used the software on my Mac, at least for the one time that it launched, Dragon Express worked extraordinarily well at capturing my voice and
translating it with astounding accuracy. But that program didn't work very well or for very long on the latest operating system for the Mac. So I had to fall back on the new built-in dictation
software that comes with the Mac. It uses a system similar to Siri on iOS devices, but it has many of Siri’s inherent weaknesses.
For instance, I never know when the
microphone is going to stop recording and decide to transcribe instead. It doesn't seem to stay on for very long. It's also curious that Siri doesn't try to increase its own accuracy in the same way
the Dragon software's various programs do. It doesn't try to train itself to recognize your voice, which seems to be one of the key elements in Dragon software's accuracy. Dragon forces you to read
through a number of key phrases and then registers your tone and some of the nuances of your voice, in order to deliver much more accurate renderings.
As I look over what I've
already dictated I can see that Siri is inserting a lot of transitional words where they don't belong and interpreting utterances incorrectly all over the place. She also has a terrible sense of
context. She doesn't seem to understand as a good word recognition program should which forms of words fit given a particular context of phrases around it. She also has a weird habit of cutting off
the final word of sentences. And for some strange reason she repeats words.
So clearly Siri is a system that is less about dictation. It seems to function best as a command
interface that relies on a fairly predictable set of phrases that it needs to activate. And this is fine. But I still have not seen anyone in public use a voice interface on their smartphone to
retrieve information except to impress or to show off the feature to someone else.
One of the weird conceits of this age of gadgetry is that somehow all traces or aspects of
traditional interfaces and interaction with media need to be modernized and updated and somehow dispensed with in order to truly to move forward. But it seems that the keyboard really is something
that most of us are wedded to for a lot of different reasons even on a touchpad. Voice seems like an efficient interface. But it turns out that even when we try to dictate lines, which is something I
am just beginning to learn, it takes a different structure of thought from the way we interact with the keyboard. I am sure that the cadence, the rhythm, and all of the different aspects of my writing
are somehow different because I'm dictating rather than interacting with the keyboard itself.
Foremost, the keyboard confers the degree of privacy that is lost in a voice
interface. But more than that, the keyboard interface is tied somehow to our internal voice, our internal being, our thought process in a way voice isn't. Voice makes things public even when we're
sitting alone. I'm sitting here, self-conscious about the process of dictation. I'm conscious of what I sound like. And I'm the kind of writer who goes back and rewrites the sentence and structure
several times, usually before what I want. I sit at midsentence for minutes at a time trying to figure out if I've got the right structure. Strangely, to me giving dictation is actually more like
public speaking than simply having a conversation with one other person.
And all of this is to say that any extended or involved use of Siri only underscores how not-human this
interface is. The illusion of human interaction really works only in small bits and pieces. If you have the kind of valet who only grunts or uses formal acknowledgements for your commands, then Siri
feels vaguely human. But this is an interface whose literal nature is in front of you at all points. Just as any computer algorithm’s inability to understand the grunts, the ums, the aws -- the
various ways in which conversation really goes -- it only reminds us how machinelike it is. For instance in the last sentence my attempt to get Siri to spell out a guttural sound was read as
"almonds." And she couldn't even wait for me to finish thinking through the structure of the sentence before she stopped transcribing.
All of which is to say that the notion of
the gadget as companion, made more personable via a voice interface, has more hoops through which to jump than personality. The machine is not the only thing being altered here. The user is being
asked to work in a different mode, and in many ways interact with a machine in ways that are not entirely human after all. How many of us have valets anyway -- let alone give them commands?