llamafile icon
llamafile icon

llamafile

 2 likes

llamafile lets you distribute and run LLMs with a single file, providing an OpenAI-compatible API as well as a KoboldAI API.

llamafile screenshot 1

License model

  • FreeOpen Source

Country of Origin

  • US flagUnited States

Platforms

  • Mac  On macOS with Apple Silicon, the Xcode Command Line Tools need to be installed for llamafile to be able to bootstrap itself.
  • Windows  On Windows, renaming a llamafile by adding .exe to the filename may be required. Windows also has a maximum file size limit of 4GB for executables, so llamafiles larger than that can't be run on Windows.
  • Linux  See the Gotchas section if you're having trouble running a llamafile: [https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#gotchas](https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#gotchas)
  • BSD
  • Self-Hosted
  No rating
2likes
0comments
0news articles

Features

Suggest and vote on features

Properties

  1.  Privacy focused

Features

  1.  No Tracking
  2.  Ad-free
  3.  Works Offline
  4.  No registration required
  5.  Dark Mode
  6.  No Coding Required
  7.  Localhost
  8.  AI Chatbot

llamafile News & Activities

Highlights All activities

Recent activities

Show all activities

llamafile information

  • Developed by

    US flagMozilla
  • Licensing

    Open Source and Free product.
  • Written in

  • Alternatives

    5 alternatives listed
  • Supported Languages

    • English

AlternativeTo Categories

AI Tools & ServicesOffice & Productivity

GitHub repository

  •  22,723 Stars
  •  1,195 Forks
  •  165 Open Issues
  •   Updated Jun 30, 2025 
View on GitHub

Our users have written 0 comments and reviews about llamafile, and it has gotten 2 likes

llamafile was added to AlternativeTo by Paul on Dec 1, 2023 and this page was last updated Jul 19, 2024.
No comments or reviews, maybe you want to be first?
Post comment/review

What is llamafile?

llamafile lets you distribute and run LLMs with a single file, providing an OpenAI-compatible API as well as a KoboldAI API.

Our goal is to make the "build once anywhere, run anywhere" dream come true for AI developers. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that lets you build apps for LLMs as a single-file artifact that runs locally on most PCs and servers and provides

First, your llamafiles can run on multiple CPU microarchitectures. We added runtime dispatching to llama.cpp that lets new Intel systems use modern CPU features without trading away support for older computers.

Secondly, your llamafiles can run on multiple CPU architectures. We do that by concatenating AMD64 and ARM64 builds with a shell script that launches the appropriate one. Our file format is compatible with WIN32 and most UNIX shells. It's also able to be easily converted (by either you or your users) to the platform-native format, whenever required.

Thirdly, your llamafiles can run on six OSes (macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD). You'll only need to build your code once, using a Linux-style toolchain. The GCC-based compiler we provide is itself an Actually Portable Executable, so you can build your software for all six OSes from the comfort of whichever one you prefer most for development.

Lastly, the weights for your LLM can be embedded within your llamafile. We added support for PKZIP to the GGML library. This lets uncompressed weights be mapped directly into memory, similar to a self-extracting archive. It enables quantized weights distributed online to be prefixed with a compatible version of the llama.cpp software, thereby ensuring its originally observed behaviors can be reproduced indefinitely.

Official Links