llama.cpp

1 like

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.

Open and start using the WebUI in your browser

Cost / License

Free
Open Source (MIT)

Application types

Origin

Bulgaria
EU

Platforms

Windows
Mac
Linux
Docker
Homebrew
Nix Package Manager
MacPorts
Self-Hosted

Add multiple text files from disk or from the clipboard to the context of your conversation

Attach one or multiple PDFs to your conversation. By default, the contents of the PDFs will be converted to RAW text, excluding any visuals.

+10

Optionally, the WebUI can process the PDFs as images when the AI model supports it.

llama.cpp alternatives

1like

0comments

27alternatives

0articles

Features

Properties

Lightweight
Privacy focused
Minimalistic

Features

No registration required
Works Offline
Hardware Accelerated
No Tracking
AI-Powered
GPU Acceleration
AI Chatbot
Support for NVIDIA CUDA acceleration
Apple Metal support

llama.cpp News & Activities

Highlights All activities

Recent activities

OrdinaryPerson added llama.cpp as alternative to Operit AI
about 16 hours ago
bugmenot liked llama.cpp
18 days ago
bugmenot added llama.cpp as alternative to LLM Hub
18 days ago
bugmenot added llama.cpp as alternative to NexaSDK
20 days ago
bugmenot added llama.cpp as alternative to Nexa Studio
21 days ago
bugmenot added llama.cpp as alternative to MNN Chat
21 days ago
bugmenot added llama.cpp as alternative to AI00 RWKV Server
21 days ago
bugmenot added llama.cpp as alternative to RWKV Runner
21 days ago
bugmenot added llama.cpp as alternative to RWKV Chat
22 days ago
bugmenot added llama.cpp as alternative to MLC LLM
22 days ago

llama.cpp information

Developed by
ggml-org
Licensing
Open Source (MIT) and Free product.
Written in
C++
Alternatives
27 alternatives listed
Supported Languages
- English

AlternativeTo Categories

AI Tools & Services, System & Hardware

GitHub repository

95,517 Stars
14,999 Forks
1120 Open Issues
Updated Feb 21, 2026

View on GitHub

Popular alternatives

View all

llama.cpp was added to AlternativeTo by bugmenot on Jan 31, 2026 and this page was last updated Jan 31, 2026.

No comments or reviews, maybe you want to be first?

What is llama.cpp?

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.

Plain C/C++ implementation without any dependencies
Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
AVX, AVX2, AVX512 and AMX support for x86 architectures
RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V architectures
1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads GPUs via MUSA)
Vulkan and SYCL backend support
CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity

The llama.cpp project is the main playground for developing new features for the ggml library.

llama.cpp

Cost / License

Application types

Origin

Platforms

llama.cpp

Features

Properties

Features

Tags

llama.cpp News & Activities

Recent activities

llama.cpp information

Developed by

Licensing

Written in

Alternatives

Supported Languages

AlternativeTo Categories

GitHub repository

Popular alternatives

What is llama.cpp?

Official Links

AppStores & Other Links

Social Networks