Improve customer interactions with intelligent, lifelike responses
Engage users with voice user interface in your devices and applications
Personalize your communication based on user preference of voice and language
Features
Custom Voice
Train a custom speech synthesis model using your own audio recordings to create a unique and more natural-sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases. Learn more.
Long audio synthesis
Asynchronously synthesize up to 1 million bytes of input with Long Audio Synthesis.
Voice and language selection
Choose from an extensive selection of 220+ voices across 40+ languages and variants, with more to come soon.
WaveNet voices
Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance.
Text and SSML support
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.
Pitch tuning
Personalize the pitch of your selected voice, up to 20 semitones more or less than the default.
Speaking rate tuning
Adjust your speaking rate to be 4x faster or slower than the normal rate.
Volume gain control
Increase the volume of the output by up to 16db or decrease the volume up to -96db.
Integrated REST and gRPC APIs
Easily integrate with any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (for example cars, TVs, speakers).
Audio format flexibility
Convert text to MP3, Linear16, OGG Opus, and a number of other audio formats.
Audio profiles
Optimize for the type of speaker from which your speech is intended to play, such as headphones or phone lines.