A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.

FastFlowLM is described as 'Run LLMs on AMD Ryzen AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs' and is a large language model (llm) tool in the ai tools & services category. There are more than 25 alternatives to FastFlowLM for a variety of platforms, including Windows, Linux, Mac, Android and Self-Hosted apps. The best FastFlowLM alternative is Ollama, which is both free and Open Source. Other great apps like FastFlowLM are GPT4ALL, Jan.ai, AnythingLLM and LM Studio.
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.

The Swiss Army Knife of offline AI. Chat, speak, and generate images. Privacy first, zero internet. Download an LLM and use it on your mobile device. No data ever leaves your phone.






This project aims to eliminate the barriers of using large language models by automating everything for you. All you need is a lightweight executable program of just a few megabytes. Additionally, this project provides an interface compatible with the OpenAI API, which means...








This application provides a full suite of generative AI features for chat, code assistance, document search, image analysis, image and video generation. All features run offline and are powered by your PC’s Intel® Core™ Ultra with built-in Intel Arc GPU or Intel Arc™ dGPU...


Learn how to add AI with local models and APIs to Windows apps. Discover AI scenarios and models such as Phi, Mistral, Stable Diffusion, Whisper, and many more to delight your users. The AI Dev Gallery is an open-source app designed to help Windows developers integrate AI...




AI00 RWKV Server is an inference API server for the RWKV language model based upon the web-rwkv inference engine.
Experience the power of RWKV models directly on your device. Completely offline, privacy-first, and efficient. No internet required.




A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and enterprise deployment on OpenShift/Kubernetes.




Run LLMs on device or connect to various commercial or open source APIs. ChatterUI aims to provide a mobile-friendly interface with fine-grained control over chat structuring.




The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.



