Mistral introduces Voxtral, its first family of open source speech understanding AI models

Mistral introduces Voxtral, its first family of open source speech understanding AI models

Mistral has introduced Voxtral, a new family of state-of-the-art speech understanding AI models. The Voxtral lineup includes a large 24-billion parameter model for production-scale applications and a lightweight 3-billion parameter model designed for local and edge deployments, allowing deployment at different scales.

In addition to their flexible deployment options, both Voxtral models are released under the Apache 2.0 license and are available through Mistral's API. Building on this accessibility, Mistral offers a highly optimized transcription-only endpoint for enhanced cost-efficiency, and the models can be downloaded from Hugging Face for independent use.

Beyond standard transcription, Voxtral delivers advanced features such as support for long-form context, built-in question answering, native summarization, multilingual processing, and function-calling directly from voice input. These extensive capabilities support real-world interactions and downstream actions, including generating summaries, answering questions, performing analysis, and extracting insights.

For organizations emphasizing budget, the Voxtral Mini Transcribe model outperforms OpenAI Whisper at under half the cost. Meanwhile, Voxtral Small delivers performance comparable to ElevenLabs Scribe, also for less than half the price. Voxtral can also be tested via Mistral Le Chat’s voice mode on both web and mobile platforms.

by Paul

K0RRmduderson
Pe
K0RR found this interesting
Voxtral iconVoxtral
  21
  • ...

Voxtral offers advanced speech understanding models available in two sizes: a 24B variant for production-scale applications and a 3B variant for local and edge deployments. Both are released under the Apache 2.0 license, catering to different deployment needs.

Comments

Mr. Anon
2

So, so, so hyped at how fast open source text-to-speech and speech-to-text models are advancing! One of the best Flathub apps for this right now in my opinion is Speech Note: Speech Note icon Speech Note

Gu