WordLlama icon
WordLlama icon

WordLlama

 Like

WordLlama is a fast, lightweight NLP toolkit that handles tasks like fuzzy-deduplication, similarity and ranking with minimal inference-time dependencies and optimized for CPU hardware.

License model

  • FreeOpen Source

Platforms

  • Python
  • Mac
  • Windows
  • Linux
  • BSD
  • Self-Hosted
  No rating
0likes
0comments
0news articles

Features

Suggest and vote on features

Properties

  1.  Lightweight

Features

  1.  Ad-free
  2.  Natural Language Processing
  3.  AI-Powered

 Tags

  • natural-language-processing-tool
  • binarization
  • word-embeddings
  • nlp-machine-learning
  • llama

WordLlama News & Activities

Highlights All activities

Recent activities

Show all activities

WordLlama information

  • Developed by

    Lee Miller
  • Licensing

    Open Source (MIT) and Free product.
  • Written in

  • Alternatives

    3 alternatives listed
  • Supported Languages

    • English

GitHub repository

  •  1,442 Stars
  •  51 Forks
  •  5 Open Issues
  •   Updated Mar 28, 2025 
View on GitHub

Popular alternatives

View all

Our users have written 0 comments and reviews about WordLlama, and it has gotten 0 likes

WordLlama was added to AlternativeTo by Paul on Sep 17, 2024 and this page was last updated Sep 17, 2024.
No comments or reviews, maybe you want to be first?
Post comment/review

What is WordLlama?

WordLlama is a utility for NLP and word embedding model that recycles components from large language models (LLMs) to create efficient and compact word representations (such as GloVe, Word2Vec or FastText). WordLlama begins by extracting the token embedding codebook from a state-of-the-art LLM (e.g., LLama3 70B), and training a small context-less model in a general purpose embedding framework.

WordLlama improves on all MTEB benchmarks above word models like GloVe 300d, while being substantially smaller in size (16MB default model @ 256-dim vs >2GB).

Features of WordLlama include:

  • Matryoshka Representations: Truncate embedding dimension as needed.
  • Low Resource Requirements: A simple token lookup with average pooling, enables this to operate fast on CPU.
  • Binarization: Models trained using the straight through estimator can be packed to small integer arrays for even faster hamming distance calculations. (coming soon)
  • Numpy-only inference: Lightweight and simple.

For flexibility, WordLlama employs the Matryoshka representation learning training technique. The largest model (1024-dim) can be truncated to 64, 128, 256 or 512. For binary embedding models, we implement straight-through estimators during training. For dense embeddings, 256 dimensions sufficiently captures most of the performance, while for binary embeddings validation accuracy is close to saturation at 512-dimensions (64 bytes packed).

The final weights are saved after weighting, projection and truncation of the entire tokenizer vocabulary. Thus, WordLlama becomes a single embedding matrix (nn.Embedding) that is considerably smaller than the gigabyte-sized llm codebooks we start with. The original tokenizer is still used to preprocess the text into tokens, and the reduced size token embeddings are average pooled. There is very little computation required, and the resulting model sizes range from 16mb to 250mb for the 128k llama3 vocabulary.

It's good option for some nlp-lite tasks. You can train sklearn classifiers on it, perform basic semantic matching, fuzzy deduplication, ranking and clustering. I think it should work well for creating LLM output evaluators, or other preparatory tasks involved in multi-hop or agentic workflows. You can perform your own llm surgery and train your own model on consumer GPUs in a few hours. Because of its fast and portable size, it makes a good "Swiss-Army Knife" utility for exploratory analysis and utility applications.

Official Links