Google shows off AI-powered end-to-end speech-to-speech translation

Written almost 2 years ago by IanDorfman

Google's Artificial Intelligence development team has introduced a new model for translating speech. This method is different from traditional speech translation, which uses automatic speech recognition to convert speech to text followed by a machine translation from the likes of Small Google Translate iconGoogle Translate, then uses text-to-speech to produce a translation.

The way this "Translatotron" works is described in a post by Google Artificial Intelligence Software Engineers Ye Jia and Ron Weiss:

"Translatotron is based on a sequence-to-sequence network which takes source spectrograms as input and generates spectrograms of the translated content in the target language."

Translatotron does not rely on traditional intermediate text representation in order to facilitate language translation. This makes it advantageous to use over cascaded systems from the standpoints of speed, step-by-step compounding errors, and enabling a more straightforward method of maintaining the original speaker's voice after translation. It also better ensures that proper nouns and names are not unnecessarily translated.

Jia and Weiss conclude their post with the following:

"To the best of our knowledge, Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language. It is also able to retain the source speaker’s voice in the translated speech. We hope that this work can serve as a starting point for future research on end-to-end speech-to-speech translation systems."

Further coverage:
Google AI Blog