Jun 11, 2026 at 12:00 PM

Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs

Google has released DiffusionGemma, a new experimental open model employing text diffusion to accelerate text generation tasks. Departing from traditional sequential token-by-token methods in large language models, DiffusionGemma generates entire blocks of text at once. This leads to a significant boost in speed, with the model delivering up to four times faster output on dedicated GPUs, reaching 1,000 tokens per second on an NVIDIA H100 and 700 on a GeForce RTX 5090.

Building on the Gemma 4 architecture and Gemini Diffusion research, DiffusionGemma introduces a novel diffusion head to maximize text generation speed. The 26 billion parameter Mixture of Experts (MoE) design activates only 3.8 billion parameters during inference, making it feasible to run on high-end consumer GPUs with just 18 GB of video memory when quantized. The model supports bidirectional attention by generating 256 tokens in parallel, allowing every token to interact with all others, which is especially useful for domains like in-line text editing, code infilling, and structured data generation.

Alongside its performance improvements, DiffusionGemma refines output in real time by evaluating entire text blocks for self-correction. While suitable for researchers and developers working with speed-critical, interactive local workflows, the model remains experimental and offers lower output quality than the standard Gemma 4, which is recommended for production environments that require maximum text quality.

Jun 11, 2026 by Paul

tx33c7xujzz1 found this interesting

MORE ABOUT: #AI Chatbots #Large Language Model (LLM) Tools #AI Writing Tools #Google Gemma

Google Gemma

AI Chatbot
Free Personal
Open Source

Google Gemma is an AI chatbot developed as part of a family of lightweight, state-of-the-art open models. It leverages the same research and technology used in creating the Gemini models. Rated 5, Google Gemma's top feature is its AI-powered capabilities. It is designed to offer advanced conversational experiences, with various alternatives available for comparison.

External links

Introducing DiffusionGemma
Google • Official source
Google's latest DiffusionGemma open AI model comes with a 4x speed boost
Ars Technica
Google’s DiffusionGemma is 4x faster than its other Gemma models
The New Stack
Google's new open model DiffusionGemma generates text from noise instead of word by word
The Decoder
Google debuts DiffusionGemma for faster text generation
Let's Data Science
DiffusionGemma: 4x Faster Text Generation
Hacker News

No comments so far, maybe you want to be first?

Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs

Related news

External links