HoldSpeak is a AI-powered app that allows you to type 3x faster. It supports over 100 languages and works 100% offline.


OmniDictate is described as 'Free, open-source, real-time dictation for Windows. Runs locally (no cloud!), uses AI, and types directly into any application via a user-friendly GUI' and is a audio transcription tool in the audio & music category. There are more than 10 alternatives to OmniDictate for a variety of platforms, including Mac, Web-based, Windows, Linux and Self-Hosted apps. The best OmniDictate alternative is Vibe Transcribe, which is both free and Open Source. Other great apps like OmniDictate are Voxtral, Whisper, TranscribeX and Murmure.
HoldSpeak is a AI-powered app that allows you to type 3x faster. It supports over 100 languages and works 100% offline.


This is Scriberr, a self-hostable AI audio transcription app. Scriber uses the Whisper models from OpenAI, to transcribe audio files offline, on your hardware.




Power your apps with world-class speech-to-text and domain-specific language models (DSLMs). Effortlessly accurate. Blazing fast. Enterprise-ready scale. Unbeatable pricing. Everything developers need to build with confidence and ship faster.

Speech to Note is a cutting-edge AI-driven tool that seamlessly converts your spoken words into a concise and informative summary.



Private, on-device audio transcription for macOS. Your audio never leaves your Mac — no cloud uploads, no subscriptions, no data collection. Real-time ASR with Qwen3-ASR, MLX Whisper & Whisper, plus system-wide dictation, all 100% local.




SaidVault is a privacy-first macOS transcription app that runs locally on Apple Silicon. It transcribes audio and video files, records voice notes, captures system audio for meetings or video playback, supports Whisper and Parakeet models, and exports to PDF, TXT, Markdown, SRT, and VTT.


NotchLive is a macOS menu bar app that displays real-time AI-powered captions and translations directly in your MacBook's notch. It uses on-device Whisper AI (via CoreML) for speech recognition and Apple Translation for real-time translation — nothing ever leaves your Mac.


VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and...


Gazelle is a joint speech-language model by Tincans — for more details and prompt ideas, see our v0.2 announcement. This is an early research preview -- please temper expectations! Gazelle can take in text and audio as input (interchangeably) and generates text as output.
