
Mistral unveils Voxtral Transcribe 2, a cheap open source speech model that runs on-device
French company Mistral AI has released Voxtral Transcribe 2, introducing two next-generation speech-to-text models. Both models offer advanced transcription quality, speaker diarization, and ultra-low latency. The product family comprises Voxtral Mini Transcribe V2, designed for batch processing, and Voxtral Realtime, built for live transcription workflows.
For batch tasks, Voxtral Mini Transcribe V2 delivers accurate results with low word error rates at a competitive price. It provides speaker diarization, context biasing, and word-level timestamps, and supports 13 languages. The language coverage includes English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.
Switching to real-time applications, Voxtral Realtime is purpose-built for live speech transcription. It features latency settings as low as 200 milliseconds, supporting voice agents and other time-sensitive systems. Voxtral Realtime is also open-weights under the Apache 2.0 license, and its 4 billion parameter footprint allows efficient and private operation on edge devices.
Beyond the models themselves, Mistral AI is launching an audio playground in Mistral Studio. This tool enables users to test out Voxtral Transcribe 2 with real-time diarization and timestamps, supporting rapid evaluation and experimentation.
