Private, on-device audio transcription for macOS. Your audio never leaves your Mac — no cloud uploads, no subscriptions, no data collection. Real-time ASR with Qwen3-ASR, MLX Whisper & Whisper, plus system-wide dictation, all 100% local.




Voxtral is described as 'State-of-the-art speech models with transcription, translation, and audio understanding, available via API or self-hosted, optimized for cost and efficiency' and is a audio transcription tool in the ai tools & services category. There are more than 50 alternatives to Voxtral for a variety of platforms, including Mac, Web-based, Windows, iPhone and Linux apps. The best Voxtral alternative is Handy STT, which is both free and Open Source. Other great apps like Voxtral are Vibe Transcribe, FUTO Voice Input, TypeWhisper and ElevenLabs.
Private, on-device audio transcription for macOS. Your audio never leaves your Mac — no cloud uploads, no subscriptions, no data collection. Real-time ASR with Qwen3-ASR, MLX Whisper & Whisper, plus system-wide dictation, all 100% local.




VoxTap is 100% offline voice-to-text for macOS that types at your cursor in any app: Terminal, VS Code, Slack, everywhere.
On-device AI. No cloud, no subscription, no signup. Just press a hotkey and talk.
$29 one-time. 45-minute free trial.




Speech to Note is a cutting-edge AI-driven tool that seamlessly converts your spoken words into a concise and informative summary.



AI executive assistant that records Google Meet, Zoom, Teams, Webex; transcribes and makes instant summaries: decisions, action items, minutes, smart titles. Search/tag, jump to quotes, export/share.



Batch transcribe audio files or movie files into text with OpenAI's Whisper AI Model. With an embed subtitles editor to preview the transcription result segment by segment. All transcribe operation is processing in local machine. Keep your privacy safe.





Buzz Captions is an offline audio transcription and translation tool powered by OpenAI's Whisper model. It allows users to import audio and video files to generate transcripts in CSV, SRT, TXT and VTT formats.

Dia is a 1.6B parameter text to speech model created by Nari Labs. It was pushed to the Hub using the PytorchModelHubMixin integration.

CMU Sphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.
Windows Speech Recognition makes using a keyboard and mouse optional. You can control your PC with your voice and dictate text instead.
Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.