Mistral AI unveils Voxtral TTS for nuanced & low-latency speech generation in 9 languages

Mistral AI unveils Voxtral TTS for nuanced & low-latency speech generation in 9 languages

Mistral AI has announced the launch of Voxtral TTS, a text-to-speech model designed for advanced multilingual voice generation. The model provides state-of-the-art results in nine languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic to deliver broad international support.

Unlike many text-to-speech systems, Voxtral TTS is lightweight with 4 billion parameters. This design facilitates efficient deployment at scale while maintaining natural-sounding and reliable speech output. Building on this efficiency, the model demonstrates advanced contextual understanding and speaker modeling, reproducing speaker personality traits such as natural pauses, rhythm, intonation, and emotional nuance.

These advances are reinforced by human evaluations, which show that Voxtral TTS surpasses ElevenLabs Flash v2.5 in naturalness and matches the quality and emotion steering capabilities of ElevenLabs v3, while maintaining fast response times. Notably, Voxtral TTS adapts to a custom voice reference as short as three seconds, accurately reproducing accent, inflections, intonation, and natural disfluencies. For a standard voice sample of ten seconds and 500 characters, the model delivers a latency of 70 milliseconds.

Users can experiment with Voxtral TTS in the Mistral AI Studio playground, use it in Le Chat, integrate it via API for $0.016 per 1,000 characters, or access open model weights on Hugging Face under a Creative Commons BY-NC 4.0 license.

by Paul

du
xSalty1
sa
za
duttyend found this interesting
Voxtral iconVoxtral
  23
  • ...

Voxtral offers advanced speech models for transcription, translation, and audio understanding, accessible through API or self-hosted solutions. It is designed for cost-effectiveness and efficiency, leveraging AI-powered speech transcription and recognition capabilities.

No comments so far, maybe you want to be first?
Gu