Gemma 4 is now up to 90% faster on Apple Silicon in Ollama 0.31

Gemma 4 is now up to 90% faster on Apple Silicon in Ollama 0.31

Ollama 0.31 introduces a substantial speed boost for Gemma 4, with Apple Silicon users seeing nearly 90% faster token generation in coding-agent benchmarks.

The performance gain comes from a multi-token prediction method that proposes and verifies several tokens at once, with the draft length auto-tuned in real time and no change to model outputs.

An optimized matrix multiplication kernel contributed to MLX further accelerates batch verification, benefiting both Gemma 4 and other models.

by Fla

Add as a preferred source on Google
Ollama iconOllama
  166

Facilitates local deployment of Llama 3, Code Llama, and other language models, enabling customization and offline AI development. Perfect for creating personalized AI chatbots and writing tools.

No comments so far, maybe you want to be first?
Gu