Advanced audio models for transcription, translation, and understanding, optimized for production and edge deployment, API accessible, open-source with Apache 2.0, delivering high accuracy, resource efficiency, and support for both large-scale and local use cases.
Cost / License
- Freemium
- Open Source (Apache-2.0)
Application type
Platforms
- Online
- Self-Hosted
- Hugging Face








It has nothing to do with Elevenlabs: Elevenlabs "speaks" voices, Voxtral "hears" voices.
Don't confuse text-to-speech (TTS; Elevenlabs) with speech-to-text (STT; Voxtral).