Jul 1, 2026 at 7:39 AM

Gemma 4 is now up to 90% faster on Apple Silicon in Ollama 0.31

Ollama 0.31 introduces a substantial speed boost for Gemma 4, with Apple Silicon users seeing nearly 90% faster token generation in coding-agent benchmarks.

The performance gain comes from a multi-token prediction method that proposes and verifies several tokens at once, with the draft length auto-tuned in real time and no change to model outputs.

An optimized matrix multiplication kernel contributed to MLX further accelerates batch verification, benefiting both Gemma 4 and other models.

Jul 1, 2026 by Fla

MORE ABOUT: #Large Language Model (LLM) Tools #AI Chatbots #AI Writing Tools #Ollama #Google Gemma

Ollama

172

Facilitates local deployment of Llama 3, Code Llama, and other language models, enabling customization and offline AI development. Perfect for creating personalized AI chatbots and writing tools.

External links

Faster Gemma 4 on MLX with multi-token prediction
Ollama Blog • Official source

No comments so far, maybe you want to be first?

Gemma 4 is now up to 90% faster on Apple Silicon in Ollama 0.31

Related news

External links