

FastFlowLM
Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.
Cost / License
- Free Personal
- Open Source
Platforms
- Windows
- Online
- Self-Hosted
FastFlowLM News & Activities
Recent activities
- bugmenot added FastFlowLM
- bugmenot added FastFlowLM as alternative to llama.cpp, Ollama, LM Studio and GPT4ALL
FastFlowLM information
What is FastFlowLM?
? FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs
Run large language models — now with Vision, Audio, Embedding and MoE support — on AMD Ryzen™ AI NPUs in minutes. No GPU required. Faster and over 10× more power-efficient. Supports context lengths up to 256k tokens. Ultra-Lightweight (16 MB). Installs within 20 seconds.
📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI. 🤝 Think Ollama — but deeply optimized for NPUs. ? From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.
FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).
🧠 Local AI on NPU
FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:
- ? Fast and low power
- 🧰 Simple CLI and API (REST and OpenAI API)
- 🔐 Fully private and offline
No model rewrites, no tuning — it just works.
? Highlights
- Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
- Lightweight runtime (16 MB) — installs within 20 seconds, easy to integrate
- Developer-first flow — like Ollama, but optimized for NPU
- Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
- No low-level tuning required — You focus on your app, we handle the rest
