Google unveils Gemini 3.1 Flash-Lite as a faster, cost-efficient AI model, now in preview

Google unveils Gemini 3.1 Flash-Lite as a faster, cost-efficient AI model, now in preview

Google has announced Gemini 3.1 Flash-Lite, the latest and most cost-efficient member of the Gemini 3 series. Targeted at developers with high-volume workloads, this release offers quality while maintaining low operational costs. Starting today, 3.1 Flash-Lite is available in preview to developers through the Gemini API in Google AI Studio and to enterprises using Vertex AI.

With pricing set at $0.25 per million input tokens and $1.50 per million output tokens, Gemini 3.1 Flash-Lite provides competitive performance at substantially reduced rates compared to larger models. It delivers a 2.5 times faster Time to First Answer Token and a 45 percent increase in output speed over its predecessor, 2.5 Flash, according to Artificial Analysis benchmark data. Google reports that this efficiency makes it well-suited for responsive, real-time systems.

Following the performance upgrade, Gemini 3.1 Flash-Lite secures an Elo score of 1432 on the Arena.ai Leaderboard and tops its tier in reasoning and multimodal benchmarks. Specifically, it scores 86.9 percent on GPQA Diamond and 76.8 percent on MMMU Pro, outperforming even larger Gemini models from previous generations.

Gemini 3.1 Flash-Lite includes thinking levels in both AI Studio and Vertex AI, letting developers manage the model's depth of reasoning. The model can process large-scale tasks like translation and content moderation, as well as more complex projects such as user interface generation, simulation, and instruction handling.

by Paul

TBayAreaPat
TBayAreaPat found this interesting
No comments so far, maybe you want to be first?
Gu