To end too often choppy and robotic voice calls tied to low bandwidth, Google has just launched Lyra, its new audio codec that harnesses machine learning to produce high-quality calls, even with questionable connection.
The Google AI team announced this week that it is making Lyra available to developers to integrate with their communications applications, promising the new tool will deliver audio calls of similar quality to that obtained with the most popular existing codecs, while requiring 60% less bandwidth.
As a reminder, audio codecs are widely used today for real-time communications on the Internet. This technology involves compressing an input audio file into a smaller packet that requires less bandwidth for transmission, and then decoding the file into a waveform that can be played through the phone’s speakerphone. a listener. The more compressed the file, the less data it takes to send audio to the listener.
A particular algorithm
The only compromise: the most compressed files are generally also more difficult to reconstruct and tend to be decompressed into less intelligible and robotic voice signals. “One of the ongoing challenges in developing codecs, both for video and audio, is to deliver increasing quality while using less data and minimizing latency for real-time communications,” explains Andrew Storus and Michael Chinen, both software engineers at Google, in a blog post.
Google engineers presented Lyra last February as a potential solution to this equation. Basically, Lyra works the same way as regular audio codecs: the system is built in two parts, with an encoder and a decoder. When a user speaks into their phone, the encoder identifies and extracts attributes of their speech, called characteristics, in 40 millisecond increments, then compresses the data and sends it over the network for the decoder to read to the receiver.
However, to give the decoder a boost, Google AI engineers infused the system with a special kind of machine learning model. Called a generative model and trained over thousands of hours of data, this type of algorithm is capable of reconstructing a complete audio file, even from a limited number of characteristics. Whereas traditional codecs simply extract information from the parameters to recreate a piece of audio, a generative model can therefore read the characteristics and generate new sounds from a small set of data.
Generative models have been the subject of much research in recent years, and different companies have taken an interest in this technology. Engineers have already developed advanced systems, starting with DeepMind’s WaveNet, which can generate speech that mimics the human voice. Equipped with a model that reconstructs audio using a minimal amount of data, Lyra can therefore maintain highly compressed files at low bit rates, while achieving high quality decoding at the other end of the line.
Google teams have rated Lyra’s performance against that of Opus, an open-source codec widely used for most voiceover applications on the internet. When used in a high bandwidth environment, with an audio rate of 32 kb / s, Opus is known to provide a level of audio quality indistinguishable from the original; but when used in low bandwidth environments, up to 6kbps, the codec begins to exhibit degraded audio quality.
In comparison, Lyra compresses raw audio up to 3 kbps. Based on feedback from experts and listeners, the researchers found that the output audio quality compared favorably to that of Opus. At the same time, other codecs capable of running at bitrates comparable to Lyra’s, like Speex, all gave the worst results, marked by unnatural and robotic voices.
“Lyra can be used wherever bandwidth conditions are insufficient for higher bit rates and where existing low bit rate codecs do not provide adequate quality,” Google teams say. The idea will appeal to most Internet users faced with insufficient bandwidth while teleworking, in the context of the current health crisis.