Cost / License
- Free
- Open Source (GPL-3.0)
Platforms
- Linux

Voxtral is described as 'State-of-the-art speech models with transcription, translation, and audio understanding, available via API or self-hosted, optimized for cost and efficiency' and is a audio transcription tool in the ai tools & services category. There are more than 100 alternatives to Voxtral for a variety of platforms, including Mac, Web-based, Windows, iPhone and Linux apps. The best Voxtral alternative is Handy STT, which is both free and Open Source. Other great apps like Voxtral are Vibe Transcribe, FUTO Voice Input, TypeWhisper and Google AI Edge Eloquent.

Buzz Captions is an offline audio transcription and translation tool powered by OpenAI's Whisper model. It allows users to import audio and video files to generate transcripts in CSV, SRT, TXT and VTT formats.

Dia is a 1.6B parameter text to speech model created by Nari Labs. It was pushed to the Hub using the PytorchModelHubMixin integration.

CMU Sphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.
Windows Speech Recognition makes using a keyboard and mouse optional. You can control your PC with your voice and dictate text instead.
Open-source Rust based AI meeting assistant with Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization. 100% local processing. No cloud required.



Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
High-quality on-device transcription. Easily convert speech to text from meetings, lectures, and more.

Write with your voice in any app on macOS. Faster and more accurate than ChatGPT, Google and OpenAI Whisper. Start talking. Stop typing.

Convert your audio and video to accurate text in seconds with advanced speaker recognition, and let AI automatically generate notes to quickly uncover the insights you need.



Vocol is an AI transcription software and a one-stop voice collaboration platform designed to boost work efficiency by turning voice and data into actionable insights.


