FastFlowLM icon
FastFlowLM icon

FastFlowLM

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

Cost / License

  • Free Personal
  • Open Source

Platforms

  • Windows  Windows 11
  • Online  [https://open-webui.testdrive-fastflowlm.com/](https://open-webui.testdrive-fastflowlm.com/)
  • Self-Hosted
1like
0comments
0articles

Features

Properties

  1.  Lightweight
  2.  Privacy focused

Features

  1.  Command line interface
  2.  Works Offline
  3.  AI Chatbot
  4.  Agentic AI
  5.  AI-Powered
  6.  AMD
  7.  Offline

FastFlowLM News & Activities

Highlights All activities

Recent activities

FastFlowLM information

  • Developed by

    US flagFastFlowLM
  • Licensing

    Open Source and Free Personal product.
  • Written in

  • Alternatives

    26 alternatives listed
  • Supported Languages

    • English

AlternativeTo Categories

AI Tools & ServicesOS & Utilities

GitHub repository

  •  780 Stars
  •  39 Forks
  •  23 Open Issues
  •   Updated  
View on GitHub

Popular alternatives

View all
FastFlowLM was added to AlternativeTo by bugmenot on and this page was last updated .
No comments or reviews, maybe you want to be first?

What is FastFlowLM?

? FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

Run large language models — now with Vision, Audio, Embedding and MoE support — on AMD Ryzen™ AI NPUs in minutes. No GPU required. Faster and over 10× more power-efficient. Supports context lengths up to 256k tokens. Ultra-Lightweight (16 MB). Installs within 20 seconds.

📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI. 🤝 Think Ollama — but deeply optimized for NPUs. ? From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.

FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).

🧠 Local AI on NPU

FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:

  • ? Fast and low power
  • 🧰 Simple CLI and API (REST and OpenAI API)
  • 🔐 Fully private and offline

No model rewrites, no tuning — it just works.

? Highlights

  • Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
  • Lightweight runtime (16 MB) — installs within 20 seconds, easy to integrate
  • Developer-first flow — like Ollama, but optimized for NPU
  • Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
  • No low-level tuning required — You focus on your app, we handle the rest

FastFlowLM Videos

Official Links