
Google launches Gemini 3.1 Flash Live with its most human-like sounding AI voice model yet
Google has released Gemini 3.1 Flash Live, its most advanced voice and audio AI model yet, designed for more natural sounding conversations. The model is powering major upgrades across Gemini Live and Search Live, including both audio and video capabilities, Google Lens integration, supports real time multimodal conversations in over 90 languages, and is available in more than 200 countries.
Developers can configure the model’s thinking level to balance speed and quality depending on the use case. At the High setting, Gemini 3.1 Flash Live scores 95.9% on the Big Bench Audio Benchmark, second only to Step Audio R1.1 Realtime, with a 2.98 second response time. At the Minimal setting, quality drops to 70.5%, but response time improves to 0.96 seconds.
Google also says the model is better at detecting pitch and emotional cues in speech, performs more reliably in noisy environments, and has improved support for triggering external tools and delivering information during live conversations, making it harder to distinguish from an actual human. Gemini 3.1 Flash Live is available in Google AI Studio, the Gemini API, and Gemini Enterprise for Customer Experience, with pricing unchanged at $0.35 per hour for audio input and $1.40 per hour for audio output.

Comments
Trying to integrate it into a writing review app turned out to be a miserable failure. Its ability to understand fantasy words are virtually non-existent. Which is a similar problem ran into with a lot of dictation programs as well. If you can't train words into them, they're about useless, whenever it comes to creative writing.