5 ways that future A.I. assistants will take voice tech to the next level

Since Siri debuted on the iPhone 4s back in 2011, voice assistants have gone from unworkable gimmick to the basis for smart speaker technology found in one in six American homes.

“Before Siri, when I talked about [what I do] there were blank stares,” Tom Hebner, head of innovation at Nuance Communications, which develops cutting edge A.I. voice technology, told Digital Trends. “People would say, ‘Do you build those horrible phone systems? I hate you.’ That was one group of people’s only interaction with voice technology.”

Mo’ knowledge, less problems

Alexa can tell you what the weather is in Kuala Lumpur, Malaysia; the total number of U.S. dollars you’ll get for 720 South African Rand, and how to spell “disestablishmentarianism.” But consumer A.I. assistants are, in essence, the digital equivalent of a person with a complete set of up-to-date encyclopedias. You get (hopefully) the right information, but there’s no pro-grade level of expertise there.

“The challenge that the systems in your home have is that there’s such a broad range of things that they’re trying to do,” Hebner told Digital Trends.

This is a tough one to solve, but doing so would be a game-changer. Nuance develops many specialist systems aimed at one specific use-case, such as helping airline customers answer queries or doctors to take notes. Doing so not only means these systems can drill down to get more detailed information, but also means that more intelligence can get baked in. “People were very excited about computers that could understand words, but that doesn’t necessarily matter if you don’t know what to do with those words,” Hebner said.

One example he gives is of a Nuance system that not only understands when doctors read out a list of potential drugs for patients, but could call out potential conflicts. This is way beyond the capabilities of most user-grade A.I. assistants.

However, having a more specialist detailed knowledge of different domains — something hinted at by Alexa Skills — could be transformative. Asking your smart speaker for legal or medical advice sounds, on the face of it, crazy. But there have been extraordinary advances in fields like legal bots, while a recently published report suggests Apple wants Siri to be able to have health-focused conversations with users by 2021.

Specialist knowledge graphs for A.I. assistants are the stuff of sci-fi dreams right now, although a recent Voicebot.ai report shows just how rapidly virtual assistants’ skillsets are expanding. When skills move into the terrain of specialities, though, we’re going to be in for a treat!

More (and better) personalization

Personalization of today’s smart speakers is still in its infancy. You can change voice assistants’ accent and presenting gender, add or remove skills, and feed it bits of information like your name and place of work. In some cases, you can set up multiple voice profiles so that Google Home will recognize the individual members of your household.

Amazon Echo Show — Image used with permission by copyright holder

But there’s still a long way to go — although the juice should be worth the squeeze. Mattersight Corporation has developed A.I. call center technology, called Predictive Behavioral Routing, which analyzes the speech patterns of callers and matches them up with human operatives with compatible personality types. According to the company, matching a person with a compatible personality will result in a successful call that lasts just half the time, next to that of a person with a conflicting personality type.

Using a similar approach could result in A.I. assistants which talk back to you the way you like to be addressed. That could be something as simple as matching the accent and voice volume of the person they’re speaking with. Or it could change the way it addresses ideas by perhaps using more emotive words for some users, compared to more dense detailed information it could use for others. Maybe some people want a voice assistant to chat to at length, while others simply want one to convey the necessary information in the most concise manner possible. A.I. assistants should be capable of both.

Technologies like Google Duplex show just how convincingly accurate A.I.-generated synthesized voices and conversations are getting. As A.I.s move into areas more complex than dishing up song requests and food timers, expect to see this technology to play a major role.

This could be aided by breakthroughs in the ability to identify users by voice. Hebner notes that Nuance’s technology can ID users from just a single solitary second of audio. “It used to take 10 seconds to understand who you are, to get an accurate signal,” he said. “The power of that is significant.” Being able to identify users by a small snippet of voice solves the password problem, and opens up the opportunity to use voice assistants for more delicate confidential information.

Getting proactive

A good assistant will do something when you ask them to. A great assistant won’t need asking. Right now, A.I. assistants are still at this first stage. Users can get the song they want or the reminder they need, but typically only when it’s been explicitly requested. As people get more comfortable with voice assistants, there’s a great opportunity for them to move beyond being purely reactive devices to proactive ones.

There are big questions about whether or not people want to hand certain jobs over to machines.

How would you feel about an A.I. assistant making decisions on your behalf? These could be anything from cranking up the thermostat when someone says they’re cold or rebooking a lunch meeting because you’re running late, to nudging you to do more exercise or get better at saving your paycheck. As more and more smart devices make their way into the home, the number of things a voice assistant could conceivably command will greatly increase.

Part of this is a social question about how comfortable people are about machines making decisions on their part. There are big questions about whether or not people want to hand certain jobs over to machines. Think of it like giving your credit card and house keys to your flesh-and-blood assistant — only with a much bigger sprinkling of Skynet. The downside is giving up a certain amount of control. The potential upside is increasing your free time. Of course, there is a big technical challenge…

It’s all about the feedback

Tom Hebner pointed out a big challenge with the issue of proactivity: how do our machines know when they’ve got it right? Returning to the idea of the good vs. great assistant, a great assistant might have all your files out ahead of a big meeting, without you needing to ask. But what if they’re the wrong files? A big issue with making home A.I. assistants more proactive is that there are currently limited ways of revealing whether or not we’re getting the information is the right information.

A.I. is good pepper the robot — Tomohiro Ohsumi/Getty Images

“If I ask for the same song every day when I walk into my house, and then day I walk in and it just starts playing, how do they know that they got it right?” Hebner said. “If I don’t stop it playing, does that mean it’s right? If I do say ‘stop,’ does that mean it got it wrong and it should never do it again? The feedback mechanism is one of the reasons you’re not getting more proactive systems.”

This is a challenging one for engineers to figure out. Anyone who’s ever had an intern asking them for instruction and feedback on every single task knows that sometimes it’s easier to do a job yourself than delegate it. An A.I. assistant is there to make your life more frictionless; not to give you dozens of mini surveys each day to confirm if it’s done its job right. This will need to be solved in a way that’s not crippling to the user friendliness of these devices, and doesn’t require a whole lot of training up front before systems learn your preferences.

What’s the answer? I’m not sure. But, as Steve Jobs once said, it’s not the job of the customer to figure it out.

New interaction methods

There’s a scene in 2001: A Space Odyssey in which the murderous HAL 9000, disconcertingly still the most famous fictional A.I. assistant in history, reveals that it doesn’t just use microphones to determine what is being said to it. When two crew members try and choose a location to speak where they know HAL can’t hear, HAL reveals that he can still understand them, based on reading their lip movement.

*2001: A Space Odyssey* Image used with permission by copyright holder

Scary moment of the movie? Sure. An example of how A.I. assistants could work in the future? Um, sure!

The idea that voice assistants should be limited to voice diminishes the possible number of ways they could usefully interact with us. With the rise of facial recognition and emotion-tracking technologies, an ever-growing number of biometrics gathered about users on a constant basis, and even the possibility of mind-reading tech on the horizon, there are plenty of different signals which could be used by A.I. assistants to draw their conclusions.

The idea that, 10 years from now, we’ll only be using voice to control these A.I. assistants is like looking at PCs in the early 80s and thinking we’ll never have more than a keyboard at our disposal.