Today at 7:30 AM

Ollama gains major performance boosts on Apple Silicon thanks to Apple's MLX framework

Ollama is gaining major performance gains on Apple Silicon, now powered by Apple’s MLX machine learning framework. By building directly on MLX and leveraging Apple’s unified memory architecture, Ollama reports a substantial speedup for users running large language models (LLMs) on Mac hardware.

While these improvements enhance overall performance, users on Apple’s latest M5, M5 Pro, and M5 Max chips will benefit from integration with the latest GPU Neural Accelerators. This reduces time to first token and increases generation speed, making the process smoother for both personal assistants and embedded coding agents like OpenClaw, Claude Code, OpenAI Codex, and OpenCode on macOS.

Beyond Apple’s platform, Ollama now supports NVIDIA’s NVFP4 format, enabling memory and storage reductions for inference tasks without compromising model accuracy. This allows users to achieve inference results consistent with production environments and introduces compatibility with models optimized by NVIDIA’s tooling.

Following these platform advances, Ollama’s improved cache system now reuses cache data across conversations to lower memory use and speed up prompt processing. For branching workflows, such as coding or agent-driven prompts, Ollama takes intelligent cache snapshots, producing faster responses and reducing computational overhead.

Today by Paul

justarandom found this interesting

MORE ABOUT: #Large Language Model (LLM) Tools #AI Chatbots #AI Writing Tools #Ollama

Ollama

132

Ollama facilitates local deployment of language models like Llama 3 and Code Llama, allowing for offline AI development and customization. It's ideal for creating personalized AI chatbots and writing tools. Rated 4.5, Ollama's standout features include AI-powered capabilities, an ad-free experience, and offline functionality.

External links

Ollama is now powered by MLX on Apple Silicon in preview
Ollama • Official source
Ollama Now Runs Faster on Macs Thanks to Apple's MLX Framework
MacRumors
Ollama is supercharged by MLX's unified memory use on Apple Silicon
AppleInsider
Ollama taps Apple’s MLX framework to make local AI models faster on Macs
The New Stack

No comments so far, maybe you want to be first?

Ollama gains major performance boosts on Apple Silicon thanks to Apple's MLX framework

Related news

External links