oMLX icon
oMLX icon

oMLX

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar.

oMLX screenshot 1

Cost / License

Platforms

  • Mac
0likes
0comments
0articles

Features

Properties

  1.  Distraction-free
  2.  Lightweight

Features

  1.  Apple Silicon support

oMLX News & Activities

Highlights All activities

Recent activities

oMLX information

  • Developed by

    KR flagJun Kim
  • Licensing

    Open Source (Apache-2.0) and Free product.
  • Written in

  • Alternatives

    51 alternatives listed
  • Supported Languages

    • English

AlternativeTo Categories

AI Tools & ServicesSystem & Hardware

GitHub repository

  •  9,317 Stars
  •  786 Forks
  •  95 Open Issues
  •   Updated  
View on GitHub

Popular alternatives

View all
oMLX was added to AlternativeTo by Darlene Sonalder on and this page was last updated .
No comments or reviews, maybe you want to be first?

What is oMLX?

Built for the way agents actually work.

Coding agents invalidate the KV cache dozens of times per session. oMLX persists every cache block to SSD — so when the agent circles back to a previous prefix, it's restored from disk in milliseconds, not recomputed from scratch. 01 — CORE Paged SSD KV caching Cache blocks are persisted to disk in safetensors format. Two-tier architecture: hot blocks stay in RAM, cold blocks go to SSD with LRU policy. Previously seen prefixes are restored across requests and server restarts — never recomputed. 02 — THROUGHPUT Continuous batching Handles concurrent requests through mlx-lm's BatchGenerator. Up to 4.14× generation speedup at 8× concurrency. No more queuing behind a single request. 03 — APP Native macOS menu bar app Start, stop, and monitor the server from your menu bar. Web dashboard for model management, chat, and real-time metrics. Signed, notarized, with in-app auto-update. Not Electron. 04 — MODELS Multi-model serving LLM, VLM, embedding, and reranker models loaded simultaneously. LRU eviction when memory runs low. Browse and download models directly from the admin dashboard. 05 — API OpenAI + Anthropic drop-in Compatible with Claude Code, OpenClaw, Cursor, and any OpenAI-compatible client. Native /v1/messages Anthropic endpoint. Web dashboard generates the exact config command for each tool. 06 — TOOLS Tool calling + MCP Supports all major tool calling formats: JSON, Qwen, Gemma, GLM, MiniMax. MCP tool integration and tool result trimming for oversized outputs. Configurable per model.

Official Links