Automotive voice command systems are about to go through a radical change, because
the success of smartspeakers such as the Amazon Echo and Google Home has
elevated consumer expectations. When the speakers people use at home work entirely
from voice command, using natural language, why do drivers still have to push a button
to issue a voice command inside their cars, and memorize specific commands that the
car can understand?
The smartphones most people carry in their cars already have powerful voice command
capabilities, with always-on Internet connections that can tap the cloud-based
computing power required for advanced voice recognition. But accessing these
capabilities in the car tends to be much more complicated than using them in the home.
As explained in the DSP Concepts white paper “Fundamentals of Voice UI,” the most
critical determiner of accurate voice recognition is the system signal-to- noise ratio. In
this case, the signal is the voice of the person stating the commands, and the noise is
everything else: the sound of the music the system is playing, the acoustic echoes of
the voice from other objects, and the surrounding environmental noises.
In cars, the signal-to- noise ratio for voice command systems tends to be much worse
than in the home, for many reasons:
1) Car audio systems are often played louder than home audio systems.
2) Car audio systems typically use eight to 20 separate speaker drivers
positioned all around the occupants, where smartspeakers use just one or
two speaker drivers placed within a few centimeters of each other.
3) Because of road, wind and engine noise, the noise level inside a car can be
as loud as 80 dB, while a typical living room noise level might be 50 dB.
Car voice command systems have addressed these challenges by using push-to- talk
operation. This improves signal-to- noise radio by reducing the volume of the car audio
system and/or routing all of its sound into the center speaker when the driver pushes
the Talk button on the steering wheel. Current push-to- talk systems also force the driver
to use specific commands that the car’s system recognizes, requiring the driver to
memorize those commands rather than use natural language. Obviously, drivers prefer
being able to ask “How do I get to Home Depot?” rather than speak a set of commands
using specific words and syntax.
But implementing natural-language voice command systems in the noisy passenger
cabins is difficult, because a voice command system needs a clear voice signal in order
to accurately recognize a person’s speech and deliver the desired response. Humans
are amazingly skilled at divining the meaning of a spoken phrase even if parts of it are
omitted due to signal breakup or surrounding noise, which is why cellphone
conversations in cars work pretty well. Unfortunately, computers are nowhere near as good at this task, which is why it’s so important to deliver them a voice signal that
contains as little noise as possible.
A natural-language, voice-activated system has to “figure out” when the driver is
speaking a command. This requires the use of a trigger word, such as the “Alexa”
command used to “wake up” an Amazon Echo. Through the use of DSP algorithms,
running on the systems-on- a-chip (SoCs) used in advanced car audio systems, the
noise coming in on through the microphone can be reduced, making it easier for the
system to pick out the trigger word from the noise of the car and the audio system itself.
Optimizing the size of the trigger word algorithm to best exploit the processing
resources available on the SoC is also critical.
Once the trigger word is recognized, the voice command system uses a microphone
array to focus on the user’s voice and excludes sounds coming from other directions,
such as the car audio system’s speakers, road noise and the sounds of other
passengers talking. This process is called beamforming. For optimum voice recognition,
the beam must be carefully tuned: tight enough to exclude unwanted sounds, but wide
enough that the commands are not lost when the person speaking moves his or her
head. The system must also use acoustic echo cancelling to remove the sounds of the
car audio system from the microphone signal, and additional noise reduction to
eliminate as much of the road, wind and engine noise as possible.
While the challenges of incorporating full voice command capabilities in cars are
considerable, the tools to meet these challenges exist. DSP Concepts’ Audio Weaver
automotive algorithm package delivers all the processing functions described above and
many more, all tunable and testable through an intuitive graphic user interface.
To learn more about how voice command systems work and how they can be
optimized, download the white paper “Fundamentals of Voice UI” from the DSP
DSP Concepts Public Relations