oMLX

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar.

Cost / License

Free
Open Source (Apache-2.0)

Origin

South Korea

Platforms

oMLX alternatives

0likes

0comments

51alternatives

0articles

Features

Properties

Distraction-free
Lightweight

Features

Apple Silicon support

oMLX News & Activities

Highlights All activities

Recent activities

justarandom added oMLX
about 6 hours ago
justarandom added oMLX as alternative to DeepSeek, Jan.ai, AnythingLLM and Alpaca - Ollama Client + 47 similar activities
about 8 hours ago

oMLX information

Developed by
Jun Kim
Licensing
Open Source (Apache-2.0) and Free product.
Written in
Python
Alternatives
51 alternatives listed
Supported Languages
- English

AlternativeTo Categories

AI Tools & Services, System & Hardware

GitHub repository

9,317 Stars
786 Forks
95 Open Issues
Updated Apr 10, 2026

View on GitHub

Popular alternatives

View all

oMLX was added to AlternativeTo by Darlene Sonalder on Apr 10, 2026 and this page was last updated Apr 10, 2026.

No comments or reviews, maybe you want to be first?

What is oMLX?

Built for the way agents actually work.

Coding agents invalidate the KV cache dozens of times per session. oMLX persists every cache block to SSD — so when the agent circles back to a previous prefix, it's restored from disk in milliseconds, not recomputed from scratch. 01 — CORE Paged SSD KV caching Cache blocks are persisted to disk in safetensors format. Two-tier architecture: hot blocks stay in RAM, cold blocks go to SSD with LRU policy. Previously seen prefixes are restored across requests and server restarts — never recomputed. 02 — THROUGHPUT Continuous batching Handles concurrent requests through mlx-lm's BatchGenerator. Up to 4.14× generation speedup at 8× concurrency. No more queuing behind a single request. 03 — APP Native macOS menu bar app Start, stop, and monitor the server from your menu bar. Web dashboard for model management, chat, and real-time metrics. Signed, notarized, with in-app auto-update. Not Electron. 04 — MODELS Multi-model serving LLM, VLM, embedding, and reranker models loaded simultaneously. LRU eviction when memory runs low. Browse and download models directly from the admin dashboard. 05 — API OpenAI + Anthropic drop-in Compatible with Claude Code, OpenClaw, Cursor, and any OpenAI-compatible client. Native /v1/messages Anthropic endpoint. Web dashboard generates the exact config command for each tool. 06 — TOOLS Tool calling + MCP Supports all major tool calling formats: JSON, Qwen, Gemma, GLM, MiniMax. MCP tool integration and tool result trimming for oversized outputs. Configurable per model.

oMLX

Cost / License

Origin

Platforms

oMLX

Features

Properties

Features

Tags

oMLX News & Activities

Recent activities

oMLX information

Developed by

Licensing

Written in

Alternatives

Supported Languages

AlternativeTo Categories

GitHub repository

Popular alternatives

What is oMLX?

Official Links

AppStores & Other Links