ElevenLabs launches Scribe, its multilingual speech-to-text model with superior accuracy
ElevenLabs has introduced "Scribe," its inaugural Speech-to-Text (STT) model, capable of transcribing audio in 99 languages with high accuracy. Designed for practical application, Scribe effectively manages complex audio environments, including background noise, multiple speakers, and diverse accents. It offers advanced features like word-level timestamps, speaker diarization, and audio-event tagging, making it ideal for structured transcripts in meeting summaries, video subtitles, legal transcription, and content indexing across industries.
Developers can access Scribe via a Speech-to-Text API, while creators and businesses can utilize the ElevenLabs dashboard for audio or video transcription. A low-latency version is in development for real-time applications, such as live captioning and instant transcription.
Benchmarking against leading models like Whisper Large V3, Deepgram Nova-3, and Gemini 2.0 Flash, Scribe demonstrates superior performance. Scribe notably enhances transcription for underserved languages, such as Serbian, Cantonese, and Malayalam, reducing errors compared to existing solutions. This advancement solidifies ElevenLabs' standing in the ASR industry, competing with top transcription services.

Comments
R.I.P. Omnivore, still a huge loss.