oMLX

1 like

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar.

Cost / License

Free
Open Source (Apache-2.0)

Application type

Large Language Model (LLM) Tool

Origin

South Korea

Platforms

oMLX alternatives

1like

0comments

51alternatives

0articles

Features

Properties

Distraction-free
Lightweight

Features

Apple Silicon support

oMLX News & Activities

Highlights All activities

Recent activities

justarandom liked oMLX
about 1 month ago
justarandom added oMLX
about 1 month ago
justarandom added oMLX as alternative to DeepSeek, Jan.ai, AnythingLLM and Alpaca - Ollama Client + 47 similar activities
about 1 month ago

n8n

n8n is a workflow automation platform for building AI-powered agents and workflows. n8n is built for technical teams who need production-grade infrastructure: connect any LLM provider, build multi-step agents with conditional log...

Learn More

oMLX information

Developed by
Jun Kim
Licensing
Open Source (Apache-2.0) and Free product.
Written in
Python
Alternatives
51 alternatives listed
Supported Languages
- English

AlternativeTo Categories

AI Tools & Services, System & Hardware

GitHub repository

11,348 Stars
986 Forks
200 Open Issues
Updated Apr 24, 2026

View on GitHub

Popular alternatives

View all

oMLX was added to AlternativeTo by Darlene Sonalder on Apr 10, 2026 and this page was last updated Apr 10, 2026.

No comments or reviews, maybe you want to be first?

Featured in Lists

Local LLM, Agent and AI to run on your hardware with no Cloud involved

A list with 20 apps by justarandom without a description.

List by Darlene Sonalder with 20 apps, updated Apr 30, 2026

What is oMLX?

Built for the way agents actually work.

Coding agents invalidate the KV cache dozens of times per session. oMLX persists every cache block to SSD — so when the agent circles back to a previous prefix, it's restored from disk in milliseconds, not recomputed from scratch. 01 — CORE Paged SSD KV caching Cache blocks are persisted to disk in safetensors format. Two-tier architecture: hot blocks stay in RAM, cold blocks go to SSD with LRU policy. Previously seen prefixes are restored across requests and server restarts — never recomputed. 02 — THROUGHPUT Continuous batching Handles concurrent requests through mlx-lm's BatchGenerator. Up to 4.14× generation speedup at 8× concurrency. No more queuing behind a single request. 03 — APP Native macOS menu bar app Start, stop, and monitor the server from your menu bar. Web dashboard for model management, chat, and real-time metrics. Signed, notarized, with in-app auto-update. Not Electron. 04 — MODELS Multi-model serving LLM, VLM, embedding, and reranker models loaded simultaneously. LRU eviction when memory runs low. Browse and download models directly from the admin dashboard. 05 — API OpenAI + Anthropic drop-in Compatible with Claude Code, OpenClaw, Cursor, and any OpenAI-compatible client. Native /v1/messages Anthropic endpoint. Web dashboard generates the exact config command for each tool. 06 — TOOLS Tool calling + MCP Supports all major tool calling formats: JSON, Qwen, Gemma, GLM, MiniMax. MCP tool integration and tool result trimming for oversized outputs. Configurable per model.

oMLX

Cost / License

Application type

Origin

Platforms

oMLX

Features

Properties

Features

Tags

oMLX News & Activities

Recent activities

oMLX information

Developed by

Licensing

Written in

Alternatives

Supported Languages

AlternativeTo Categories

GitHub repository

Popular alternatives

Featured in Lists

Local LLM, Agent and AI to run on your hardware with no Cloud involved

What is oMLX?

Official Links

AppStores & Other Links