OpenAI unveils advanced Speech-to-Text and Text-to-Speech models in API update
OpenAI has introduced new speech-to-text and text-to-speech models in its API, enhancing voice agent capabilities with improved accuracy and customization. The speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, offer advancements over the previous Whisper models, particularly in word error rate and transcription accuracy. Additionally, OpenAI unveiled gpt-4o-mini-tts, a text-to-speech model with improved steerability, allowing developers to control text articulation, though it currently supports only preset artificial voices.
These improvements are a result of reinforcement learning and training on diverse audio datasets, which enhance performance in handling accents and noisy environments. OpenAI is focused on further enhancing these models, ensuring intelligence and accuracy while enabling developers to use custom voices safely.
The models are available to developers through OpenAI's API, expanding their potential applications. Aditionally, OpenAI launched a demo page on OpenAI.fm where developers can test the models and use the Agents SDK to convert text-based agents into voice agents, and recommends the Realtime API for low-latency speech-to-speech interactions. Pricing for these models varies, with gpt-4o-transcribe costing $6 per million audio input tokens, gpt-4o-mini-transcribe at $3 per million, and gpt-4o-mini-tts at $12 per million audio output tokens.

Comments
Am I the only one missing the days where ClosedAI were actually OpenAI and they run it as a non-profit organization building better AI tool for humanity ?