Jun 20, 2023 at 3:22 PM

Meta Unveils Voicebox: A Generative AI Model for Audio Applications

Meta has unveiled its latest venture in the field of artificial intelligence with Voicebox , a generative AI model designed for audio applications. Building on the success of its AI voice technology capable of recognizing over 4,000 languages, Meta's Voicebox takes vocal capabilities to new heights. This new technology is a generative model, similar to a text-to-speech (TTS) system but more capable and versatile.

Voicebox offers a range of features, including the generation, editing, and sampling of audio. It aims to create natural-sounding voices for virtual assistants and metaverse characters, as well as to improve audio synthesis for individuals with visual impairments or content creators. Unlike text and image-focused models like ChatGPT or , Voicebox operates by producing audio files based on user-provided text.

What sets Voicebox apart from other speech synthesis models is its remarkably efficient training process. Meta leveraged audio recordings and transcriptions from non-specialized audiobooks in English, French, Spanish, German, Polish, and Portuguese. Despite this departure from the conventional training approach, the resulting vocal syntheses generated by Voicebox have proven to be highly effective and straightforward compared to existing solutions such as Microsoft's Vall-E.

Voicebox's advanced capabilities go beyond mere text-to-speech conversion. The model can predict and seamlessly fill in missing speech segments based on surrounding discourse and the provided transcription. This unique ability facilitates speech generation and allows for the enhancement of previously recorded audio files, saving time and effort. It can identify and replace poorly pronounced words or background noise, ensuring a more seamless listening experience.

Moreover, Voicebox supports multiple languages, enabling text-to-speech readings in English, French, German, Spanish, Polish, and Portuguese. Even when the languages of the speech sample and the given text do not match, Voicebox can produce accurate audio outputs. With training data spanning various genres in the six mentioned languages, Voicebox excels at delivering speech that closely mimics real-life conversations.

Despite its promising features, Meta has announced that Voicebox is currently unavailable to the public due to security reasons. Interested individuals can, however, explore the demonstration videos and accompanying research document included in the press release. As Meta continues to push the boundaries of AI innovation, Voicebox sets a new benchmark for high-quality, versatile audio synthesis.

Jun 20, 2023 by Danilo Venom

MORE ABOUT: #Text to Speech Services #Voicebox

Voicebox

Text to Speech Service
Paid
Proprietary

Voicebox is a versatile text-to-speech software that comes packed with features to enhance your audio listening experience. With 16 alternatives on AlternativeTo, Voicebox is a popular choice for those seeking alternatives to NaturalReader, LOVO Studio, and Amazon Polly. Its top features include noise removal, support for multiple languages, and the ability to convert text into speech. Whether you're using it for personal or professional use, Voicebox is a reliable and efficient tool for all your audio needs.

External links

Introducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performance
Meta AI • Official source
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
VoiceBox • Official source
Meta's Voicebox AI is a Dall-E for text-to-speech | Engadget
Engadget

No comments so far, maybe you want to be first?