

FastFlowLM
Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.
Cost / License
- Free Personal
- Open Source
Application types
Platforms
- Windows
- Online
- Self-Hosted
Features
Properties
- Lightweight
- Privacy focused
Features
- Command line interface
- Works Offline
- AI Chatbot
- Agentic AI
- AI-Powered
- AMD
- Offline
FastFlowLM News & Activities
Recent activities
OrdinaryPerson added FastFlowLM as alternative to Operit AI
justarandom added FastFlowLM as alternative to Off Grid Mobile
bugmenot added FastFlowLM as alternative to LLM Hub
bugmenot added FastFlowLM as alternative to NexaSDK
bugmenot added FastFlowLM as alternative to AnythingLLM
bugmenot added FastFlowLM as alternative to Nexa Studio
bugmenot added FastFlowLM as alternative to MNN Chat
bugmenot added FastFlowLM as alternative to AI00 RWKV Server
bugmenot added FastFlowLM as alternative to RWKV Runner
bugmenot added FastFlowLM as alternative to RWKV Chat
FastFlowLM information
What is FastFlowLM?
? FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs
Run large language models — now with Vision, Audio, Embedding and MoE support — on AMD Ryzen™ AI NPUs in minutes. No GPU required. Faster and over 10× more power-efficient. Supports context lengths up to 256k tokens. Ultra-Lightweight (16 MB). Installs within 20 seconds.
📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI. 🤝 Think Ollama — but deeply optimized for NPUs. ? From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.
FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).
🧠 Local AI on NPU
FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:
- ? Fast and low power
- 🧰 Simple CLI and API (REST and OpenAI API)
- 🔐 Fully private and offline
No model rewrites, no tuning — it just works.
? Highlights
- Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
- Lightweight runtime (16 MB) — installs within 20 seconds, easy to integrate
- Developer-first flow — like Ollama, but optimized for NPU
- Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
- No low-level tuning required — You focus on your app, we handle the rest
