llama.cpp
Like
The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.
Cost / License
- Free
- Open Source (MIT)
Application types
Platforms
- Windows
- Mac
- Linux
- Docker
- Homebrew
- Nix Package Manager
- MacPorts
- Self-Hosted
Features
Properties
- Lightweight
- Privacy focused
- Minimalistic
Features
- No registration required
- Works Offline
- Hardware Accelerated
- No Tracking
- AI-Powered
- GPU Acceleration
- AI Chatbot
- Support for NVIDIA CUDA acceleration
- Apple Metal support
Tags
- ggml
- local-ai
- llamacpp
- multimodal
- vulkan
- gguf
llama.cpp News & Activities
Highlights All activities
Recent activities
bugmenot added llama.cpp as alternative to FastFlowLM- bugmenot added llama.cpp
- bugmenot added llama.cpp as alternative to vllm-playground
bugmenot added llama.cpp as alternative to BonzAI - Your Local AI Agent
llama.cpp information
No comments or reviews, maybe you want to be first?
What is llama.cpp?
The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.
- Plain C/C++ implementation without any dependencies
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
- AVX, AVX2, AVX512 and AMX support for x86 architectures
- RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V architectures
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads GPUs via MUSA)
- Vulkan and SYCL backend support
- CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
The llama.cpp project is the main playground for developing new features for the ggml library.



