Mar 21, 2025 at 5:28 PM

OpenAI unveils advanced Speech-to-Text and Text-to-Speech models in API update

OpenAI has introduced new speech-to-text and text-to-speech models in its API, enhancing voice agent capabilities with improved accuracy and customization. The speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, offer advancements over the previous Whisper models, particularly in word error rate and transcription accuracy. Additionally, OpenAI unveiled gpt-4o-mini-tts, a text-to-speech model with improved steerability, allowing developers to control text articulation, though it currently supports only preset artificial voices.

These improvements are a result of reinforcement learning and training on diverse audio datasets, which enhance performance in handling accents and noisy environments. OpenAI is focused on further enhancing these models, ensuring intelligence and accuracy while enabling developers to use custom voices safely.

The models are available to developers through OpenAI's API, expanding their potential applications. Aditionally, OpenAI launched a demo page on OpenAI.fm where developers can test the models and use the Agents SDK to convert text-based agents into voice agents, and recommends the Realtime API for low-latency speech-to-speech interactions. Pricing for these models varies, with gpt-4o-transcribe costing $6 per million audio input tokens, gpt-4o-mini-transcribe at $3 per million, and gpt-4o-mini-tts at $12 per million audio output tokens.

Mar 21, 2025 by Mauricio B. Holguin

city_zen found this interesting

MORE ABOUT: #Audio Transcription Tools #ChatGPT #Whisper

Whisper

Audio Transcription Tool
Freemium
Open Source

End-to-end speech recognition model trained on 680,000 hours of multitask, multilingual audio data, offering robust transcription, translation, and language identification.

Related news

All news about Whisper »

External links

Introducing next-generation audio models in the API
OpenAI Blog • Official source
OpenAI upgrades its transcription and voice-generating AI models
TechCrunch
OpenAI announces next-generation audio models to power voice agents
Neowin
OpenAI Unveils New Models And Advances In AI Technology
The Pinnacle Gazette
OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
Maginative

Comments

Darlene Sonalder

CommentMar 24, 2025

Am I the only one missing the days where ClosedAI were actually OpenAI and they run it as a non-profit organization building better AI tool for humanity ?