There Google’s artificial intelligence This leaves us with many advantages, which also affect different areas. The most recent is the arrival of Gemini, in three different variants, which will enrich Google Bard and is now available on the Pixel 8 Pro (only in English, for the moment).
But the search giant has other applications to reap the benefits of AI. Enter MusicLM into the equation for creating music, or Google Translatotron, which just happens to now reaches its third version. We tell you everything you need to know about this advanced translator, how it works and what makes it different.
Translate voice from one language to another, without the need to convert it to text
This technology based on artificial intelligence broke out in 2019, when the first version was released. At that time it was already surprising with its ability to translate conversations from voice to voice, that is, it is a model that does not need to translate information into text to translate it (as current translators do).
However, as Xataka’s colleagues pointed out, there was still much room for improvement. In fact, two years later, with the deployment of Translatotron 2 (July 2021), we have seen how it has improved in sections such as the quality of translations or the naturalness of speech. Today it returns in better shape and is positioned – according to its own creators – as the “first fully unsupervised end-to-end model for direct speech-to-speech translation”.
To understand how it works, it is necessary to explain how standard voice translation systems. These total four stages, listed below:
- Automatic voice recognition
- Speech to text transcription
- Automatic translation
- Text to speech conversion
Well, Google Translatotron is able to cut this usual procedure in half. You don’t need to transcribe the entered speech into text, and therefore you also don’t need to convert it back at the end. This is what makes this AI model special from Google, whose third iteration improves the functionality of its predecessors.
One of the new features comes from S2ST architecture (speech-to-speech translation model), capable of “learning” using only data in a single language (monolingual data). What are the benefits of advancing this architecture? This is what Eliya Nachmani, member of the Google Research team, explains to us:
“This method opens the door not only to translation between more language pairs, but also to the translation of non-textual vocal attributes, such as pauses, speech rate and speaker identity.”
Voice translation is today one of the aspects that attracts the most competition, particularly since the emergence of artificial intelligence. Google Translateatron faces Seamless, the Meta-AI model capable of translating in real time, while maintaining vocal style.
However, each has its main claim in its nature: While Google is notable for its omission of text-to-text translation, Meta has automatic speech recognition and text-to-speech capabilities.
Anyway, we know the training process that Translatotron went through: a first part focused on the encoding of the input (speech), and a second part dedicated to its translation (by back-translation). But if we want to know it in depth, we can take a look at the article published this year by scientists from the Google Research division.
The result, a total success If we listen to the authors’ words: “Translatotron 3 far outperforms the reference system in all aspects that we measure: translation quality, speaker similarity and speech quality. It especially stood out in the conversational corpus.” As if that wasn’t enough, the model is able to achieve natural speech “similar to that of real audio samples.”
These are Google’s advances in this area, which is already preparing for a future with Translatotron 4. The next version would up the ante by adding support for more languages, even among those with low-resource data. For now we will continue to follow the news of this model, which could debut in a well-known enterprise service, as happened with Bard and the Google Assistant.
More information | Google Search
Cover image | Stable Diffusion XL with edition
In Xataka Android | This is all you can do with AI in WhatsApp