llama.cpp icon
llama.cpp icon

llama.cpp

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.

Open and start using the WebUI in your browser

Cost / License

  • Free
  • Open Source (MIT)

Platforms

  • Windows
  • Mac
  • Linux
  • Docker
  • Homebrew  brew install llama.cpp
  • Nix Package Manager  nix profile install nixpkgs#llama-cpp
  • MacPorts  sudo port install llama.cpp
  • Self-Hosted
1like
0comments
0articles

Features

Properties

  1.  Lightweight
  2.  Privacy focused
  3.  Minimalistic

Features

  1.  No registration required
  2.  Works Offline
  3.  Hardware Accelerated
  4.  No Tracking
  5.  AI-Powered
  6.  GPU Acceleration
  7.  AI Chatbot
  8.  Support for NVIDIA CUDA acceleration
  9.  Apple Metal support

llama.cpp News & Activities

Highlights All activities

Recent activities

llama.cpp information

  • Developed by

    BG flagggml-org
  • Licensing

    Open Source (MIT) and Free product.
  • Written in

  • Alternatives

    27 alternatives listed
  • Supported Languages

    • English

AlternativeTo Categories

AI Tools & ServicesSystem & Hardware

GitHub repository

  •  95,517 Stars
  •  14,999 Forks
  •  1120 Open Issues
  •   Updated  
View on GitHub

Popular alternatives

View all
llama.cpp was added to AlternativeTo by bugmenot on and this page was last updated .
No comments or reviews, maybe you want to be first?

What is llama.cpp?

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.

  • Plain C/C++ implementation without any dependencies
  • Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
  • AVX, AVX2, AVX512 and AMX support for x86 architectures
  • RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V architectures
  • 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
  • Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads GPUs via MUSA)
  • Vulkan and SYCL backend support
  • CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity

The llama.cpp project is the main playground for developing new features for the ggml library.

Official Links