OpenAI unveils advanced Speech-to-Text and Text-to-Speech models in API update

OpenAI unveils advanced Speech-to-Text and Text-to-Speech models in API update

OpenAI has introduced new speech-to-text and text-to-speech models in its API, enhancing voice agent capabilities with improved accuracy and customization. The speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, offer advancements over the previous Whisper models, particularly in word error rate and transcription accuracy. Additionally, OpenAI unveiled gpt-4o-mini-tts, a text-to-speech model with improved steerability, allowing developers to control text articulation, though it currently supports only preset artificial voices.

These improvements are a result of reinforcement learning and training on diverse audio datasets, which enhance performance in handling accents and noisy environments. OpenAI is focused on further enhancing these models, ensuring intelligence and accuracy while enabling developers to use custom voices safely.

The models are available to developers through OpenAI's API, expanding their potential applications. Aditionally, OpenAI launched a demo page on OpenAI.fm where developers can test the models and use the Agents SDK to convert text-based agents into voice agents, and recommends the Realtime API for low-latency speech-to-speech interactions. Pricing for these models varies, with gpt-4o-transcribe costing $6 per million audio input tokens, gpt-4o-mini-transcribe at $3 per million, and gpt-4o-mini-tts at $12 per million audio output tokens.

by Mauricio B. Holguin

cz
city_zen found this interesting
Whisper iconWhisper
  24
  • ...

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and...

Comments

Darlene Sonalder
0

Am I the only one missing the days where ClosedAI were actually OpenAI and they run it as a non-profit organization building better AI tool for humanity ?

Gu