The Audio of Things, Part 1: What’s the best human/machine interface?

This is the first of a three-part blog series that explores the role that audio plays in our evolving relationship with machines

 

"Wait, how do I..."

Face it, we now spend much of our time every day interacting with and commanding machines. The efficacy those interactions depends largely on the interface. When that interface is simple, fast, and intuitive, the result is satisfying and useful. To cite one common example, everyone can work a toaster -- or a golf cart or a vacuum cleaner.

But when operating the machine requires considerable knowledge, failure becomes not only possible, but often likely. We know what we want the machine to do, but we have to translate that intent into a complex series of button pushes that the machine can understand. We often rely on trial and error to learn what combinations of buttons will deliver the desired result.

For example, ever tried to soften a cold stick of butter or a pint of ice cream in the microwave? You’ll probably have to guess at the time and power settings, and watch carefully to make sure your food isn’t getting too warm -- because what can you do with a fully melted bowl of ice cream?
 

"...we have a bandwidth problem"

Every year, the machines are becoming exponentially smarter. But we’re not. Tech entrepreneur and impresario Elon Musk recently addressed the question in a talk with tech journalists Kara Swisher and Walt Mossberg. After noting that AI systems will eventually be as far above us in intelligence as we are above house-pets, he discussed ways to make humans smarter.

“We’re already a cyborg,” he said. “With your phone, you can answer any question, video conference with anyone anywhere … The only constraint is input/output. On a phone, you have two thumbs tapping away.”  If you've ever used your TV's remote control to search for something using an on-screen-keyboard, you definitely know the pain of a low-bandwidth interface. To try tackling this problem, Musk co-founded Neuralink, a neurotechnology startup that’s exploring the possibility of a direct Machine/Brain Interface (MBI) - “because we have a bandwidth problem. You just can't communicate through fingers. It's too slow.”

 

The good news...

While direct Machine/Brain Interfaces are likely many years away, today's "smart" devices are sidestepping the issues of 'interface bandwidth' and 'intent translation' in a different but similar way - by adopting the most human of interfaces: speech. 

Generally "smart" devices with a Voice User Interface take a two step approach to discerning our intent: First, an Automatic Speech Recognition (ASR) module is used to convert speech to text. Then intent is determined by analyzing that text with a Natural Language Understanding (NLU) engine.  These processes typically happen on cloud-computing platforms (like Amazon Voice Services or Google Voice Assistant) that use AI and machine learning to handle an extremely broad set of queries / commands and generate an equally broad set of replies and actions. That said, there's also a growing number of technology providers offering embedded speech-to-intent engines that run completely on the device itself, albeit with a limited vocabulary and set of actionable intents.

 

Machine Hearing

While advances in machine learning and AI have made speech-to-intent services surprisingly accurate and robust, that's only half the battle. As anyone who's ever shhhhh'd the room so they could ask google something can attest, many devices struggle to clearly hear our voice in the first place. So while we've largely solved the issues of ‘interface bandwidth’ and ‘intent translation’,  many voice controlled devices still require humans to change their behavior to accommodate the limitations of the human/machine interface.

In part two of our AoT blog series, we’ll explore the unique challenges of sound-based interfaces and learn why we're just now seeing consumer products that can approach and even exceed the limits of human perception.

 

About DSP Concepts

DSP Concepts, Inc. provides embedded audio digital signal processing solutions delivered through its Audio Weaver® embedded processing platform. DSP Concepts specializes in microphone as well as playback processing and is the leading supplier to top tier brands in consumer and automotive products. Founded by Dr. Paul Beckmann in 2003, DSP Concepts is headquartered in Santa Clara, California with offices in Boston and Stuttgart.