Cynicism about voice technology

I vividly remember the month that I first got Siri on iPhone, not least because I’ve barely used it since the initial novelty wore off. Now, The Economist is getting excited about voice technology, and even has a special report about Siri, Alexa, and the future of natural language voice computing. For my part, I’m deeply unimpressed not just with voice input technologies, but with the whole idea that natural conversation should be any kind of exemplar for interaction with computers (excepting, of course, for people who rely on voice input for accessibility reasons, or for the increasingly rare newcomers to IT). Here are some quick reasons why –

(1) Our interactions with computers tend to be highly dynamic, relying on constant feedback. What I mean by that is – when I’m engaging with a computer, I typically ask lots of quick bad questions and use the answers to guide me. I’d prefer to look quickly in three possible locations for a file rather than ask, “Siri, can you look in Downloads, or possibly Documents, or possibly Desktop, for a file that could be called Latest Draft or Friday Draft or January Draft? I’ll know it when I see it, so just list all the possible contenders”. The ability to quickly interact with computers and maintain a steady flow of information between user and interface seems like an advantage of using machines compared to speaking to humans, which tends to consist in relatively long spoken messages bouncing back and forth.

(2) A lot of the formal syntax we use when interacting with computers is very clear and powerful compared to natural language. If I google ‘ “cecil rhodes” -oxford’, I’ve succinctly asked google to show me articles about Cecil Rhodes from the that make no mention of Oxford. Once you get into more complex searches and operations, the advantages of formal operators become even starker. Of course, you can integrate a similar syntax into Siri or Alexa, but that constitute a different (albeit compatible) vision of how voice computing is going to go.

(3) Voice input is likely to remain relatively unreliable compared to text input. If I can type “indonesia population” and get an immediate answer, why would I bother vocally asking Siri for the answer? Even if Siri is great at recognizing my accent through my heavy cold, I could cough or sneeze or mispronounce a word or simply get distracted for a second, and waste a valuable 6 seconds of my time.

(4) Finally, on dictation specifically – while I understand that many people hate typing, I find it massively surpasses speaking as a way of composing stuff that is *intended to be read*. As I write, I see how stuff will look to the reader; the phrase that sounds just fine with the benefit of vocal nuance and intonation might look hamfisted and obscure as words on a page. That joke that comes across brilliantly in speech may fall flat in text. Of course, you can read back what’s on the page to yourself as you dictate it, but given how different the pragmatics of speech and text are to begin with, I find it far better to skip out the spoken element all together and frame the process of composition as a dialogue between writer and reader, rather than speaker and reader.

All that said, there is one closely related area of tech that I find much more exciting (and a little scary), namely the increasing ability of virtual assistants to integrate data and predict what kind of information I’m going to find useful. I’ve been very impressed by the way that the Google app on my Android phone manages (without any real input from me) to sift through my Gmail to figure out upcoming deadlines, bills, flights, and so on, and (increasingly) directs me towards articles . I was also surprised recently when travelling in Martinique that Google had noted my change of location and was suggesting useful French phrases. This kind of smooth, automated, integration and presentation of information is genuinely useful, and comes close to some of the functions of a real (human) PA. Of course, it also raises privacy concerns. Still, I see more potential for big leaps in user experience via this aspect of the tech than Siri, Alexa, Cortana, and co.

