the Trouble with Voice Assistants
Maybe things are fine where you are, but in our house, the kids (three of them) constantly battle to gain Alexa’s favor. They shout over each other at the dinner table in an attempt to gain dominance and make their requests heard by the device. All to no avail, as the verbal assault just goes back and forth ad infinitum, with each child immediately repeating their last request, one after the other, until my wife and I can’t take it anymore and yell at everyone (including Alexa) to shut the hell up, or we unplug the device and lock it away in a drawer.
Apple gave us Face ID, but what’s needed just as badly is Voice ID (Note: some banks are already using it to identify callers. But who uses a phone to call anybody anymore?).
Not surprisingly, Google, Apple, and Amazon have all done an amazing job making sure their devices can hear us, even when spoken to at normal volume from the next room, or in the midst of background noise, or in a crowded room. Now what they need to work on is getting these devices to stop listening when we ask them to.
It shouldn’t be hard to do. All’s it’s going to take is for the device to recognize each individual voice in a household (which, apparently is already a thing (see above)). It’s not enough that a device knows someone is speaking. They should also know who is speaking. Further, we need to be able to request that they stop listening, and be able to tell it who to stop listening to.
For instance: “Alexa, ignore Gus.”
Or, “Alexa, ignore everyone but Raegan and me.”
Or, “Alexa, don’t listen to strangers.” This one occurred to me after one of Gus’s friends came in the house and immediately took over the Alexa music selection for the house. Who does that? Kids, that’s who.
Both Alexa and Google Home have the ability to set up voice profiles so the device knows who’s talking, but neither device lets an administrator tell it to stop listening or lets an admin set up preferences for each individual user. For example: person X can’t make music requests, person Y can’t make purchase requests, person Z can only do A between the hours of B and C.
Google Home tailors its answers depending on who’s talking, and knows which account to pull data from based on the speaker’s voice. The use cases here are calendar appointments and inquiring about commute traffic for the drive to work. Alexa also offers “personalized experiences” based on the speaker’s voice, primarily around calling and messaging: (e.g. users can simply say, “Play my messages” or “Send a Message” and the device knows whose messages to play or who to say the message is from. Alexa also recognizes your voice for shopping and news. And lastly, Amazon Music gets it nearly right: after training the device on their voice, Alexa plays personalized music for each user or makes personalized recommendations.
That’s all well and good, but in the name of all that is holy, please let us tell these things when to stop listening!
Here’s how this might work: in the mobile app for the device, the administrator (let’s assume a parent) would setup a voice profile for each individual in the house. Strangely enough Amazon currently only lets you create voice profiles for individuals over the age of 13. Why, Alexa, why?? The kids under 13 are the ones you want it to stop listening to! On the other hand, does it really know which voices belong to kids under 13? I’m guessing this is an easy hack.
Ideally, some AI would need to be added to track each child’s voice over time, as it changes and deepens, to ensure the device still recognizes the voice as belonging to the correct child. It’s probably safe to assume Facebook’s got something around this in the works, given how cleverly they got us to unwittingly educate their AI bots by feeding them our personal aging data with pictures of ourselves taken ten years apart.
But Steve, you’re thinking, this sounds like a lot of work for not much reward. And to this, I answer, phooey! First of all, stopping my family dinners from being chaotic screaming rituals is benefit enough. But secondly, consider all the possibilities once these devices are not only able to recognize our voices, but are also able to recognize our emotional and mental states, without the use direct commands (this is not far off, either). We’ll be able to tell Alexa things like, “Alexa, don’t let me buy anything if I’m drunk.” Or, “Don’t let me message my old boy/girlfriend if I’m really upset (or, again, if I’m drunk).” Or conversely, how about being able to set actions triggered only by perceived emotional states (e.g. “Alexa, if I’m sad, play Iron & Wine.”). The possibilities are endless.