The Challenge of the Always-Listening Car

Automotive voice command systems are about to go through a radical change, because the success of smartspeakers such as the Amazon Echo and Google Home has elevated consumer expectations. When the speakers people use at home work entirely from voice command, using natural language, why do drivers still have to push a button to issue a voice command inside their cars, and memorize specific commands that the car can understand?

The smartphones most people carry in their cars already have powerful voice command capabilities, with always-on Internet connections that can tap the cloud-based computing power required for advanced voice recognition. But accessing these capabilities in the car tends to be much more complicated than using them in the home.

As explained in the DSP Concepts white paper “Fundamentals of Voice UI,” the most critical determiner of accurate voice recognition is the system signal-to- noise ratio. In this case, the signal is the voice of the person stating the commands, and the noise is everything else: the sound of the music the system is playing, the acoustic echoes of the voice from other objects, and the surrounding environmental noises.

In cars, the signal-to- noise ratio for voice command systems tends to be much worse than in the home, for many reasons:

1) Car audio systems are often played louder than home audio systems.
2) Car audio systems typically use eight to 20 separate speaker drivers positioned all around the occupants, where smartspeakers use just one or two speaker drivers placed within a few centimeters of each other.
3) Because of road, wind and engine noise, the noise level inside a car can be as loud as 80 dB, while a typical living room noise level might be 50 dB.

Car voice command systems have addressed these challenges by using push-to-talk operation. This improves signal-to- noise radio by reducing the volume of the car audio system and/or routing all of its sound into the center speaker when the driver pushes the Talk button on the steering wheel. Current push-to- talk systems also force the driver to use specific commands that the car’s system recognizes, requiring the driver to memorize those commands rather than use natural language. Obviously, drivers prefer being able to ask “How do I get to Home Depot?” rather than speak a set of commands using specific words and syntax.

But implementing natural-language voice command systems in the noisy passenger cabins is difficult, because a voice command system needs a clear voice signal in order to accurately recognize a person’s speech and deliver the desired response. Humans are amazingly skilled at divining the meaning of a spoken phrase even if parts of it are omitted due to signal breakup or surrounding noise, which is why cellphone conversations in cars work pretty well. Unfortunately, computers are nowhere near as good at this task,which is why it’s so important to deliver them a voice signal that contains as little noise as possible.

A natural-language, voice-activated system has to “figure out” when the driver is speaking a command. This requires the use of a trigger word, such as the “Alexa” command used to “wake up” an Amazon Echo. Through the use of DSP algorithms, running on the systems-on- a-chip (SoCs) used in advanced car audio systems, the noise coming in on through the microphone can be reduced, making it easier for the system to pick out the trigger word from the noise of the car and the audio system itself. Optimizing the size of the trigger word algorithm to best exploit the processing resources available on the SoC is also critical.

Once the trigger word is recognized, the voice command system uses a microphone array to focus on the user’s voice and excludes sounds coming from other directions, such as the car audio system’s speakers, road noise and the sounds of other passengers talking. This process is called beamforming. For optimum voice recognition, the beam must be carefully tuned: tight enough to exclude unwanted sounds, but wide enough that the commands are not lost when the person speaking moves his or her head. The system must also use acoustic echo cancelling to remove the sounds of the car audio system from the microphone signal, and additional noise reduction to eliminate as much of the road, wind and engine noise as possible.

While the challenges of incorporating full voice command capabilities in cars are considerable, the tools to meet these challenges exist. DSP Concepts’ Audio Weaver automotive algorithm package delivers all the processing functions described above and many more, all tunable and testable through an intuitive graphic user interface.

To learn more about how voice command systems work and how they can be optimized, download the white paper “Fundamentals of Voice UI” from the DSP Concepts website.