
Gemma 4 is now up to 90% faster on Apple Silicon in Ollama 0.31
Ollama 0.31 introduces a substantial speed boost for Gemma 4, with Apple Silicon users seeing nearly 90% faster token generation in coding-agent benchmarks.
The performance gain comes from a multi-token prediction method that proposes and verifies several tokens at once, with the draft length auto-tuned in real time and no change to model outputs.
An optimized matrix multiplication kernel contributed to MLX further accelerates batch verification, benefiting both Gemma 4 and other models.
No comments so far, maybe you want to be first?
Gu

