WordLlama

WordLlama is a fast, lightweight NLP toolkit that handles tasks like fuzzy-deduplication, similarity and ranking with minimal inference-time dependencies and optimized for CPU hardware.

Cost / License

Free
Open Source (MIT)

Platforms

Python
Mac
Windows
Linux
BSD
Self-Hosted

WordLlama alternatives

0likes

0comments

3alternatives

0articles

Features

Properties

Lightweight

Features

Ad-free
Natural Language Processing
AI-Powered

WordLlama News & Activities

Highlights All activities

Recent activities

No activities found.

WordLlama information

Developed by
Lee Miller
Licensing
Open Source (MIT) and Free product.
Written in
Python
Alternatives
3 alternatives listed
Supported Languages
- English

GitHub repository

1,453 Stars
50 Forks
2 Open Issues
Updated Dec 1, 2025

View on GitHub

Popular alternatives

View all

WordLlama was added to AlternativeTo by Paul on Sep 17, 2024 and this page was last updated Sep 17, 2024.

No comments or reviews, maybe you want to be first?

What is WordLlama?

WordLlama is a utility for NLP and word embedding model that recycles components from large language models (LLMs) to create efficient and compact word representations (such as GloVe, Word2Vec or FastText). WordLlama begins by extracting the token embedding codebook from a state-of-the-art LLM (e.g., LLama3 70B), and training a small context-less model in a general purpose embedding framework.

WordLlama improves on all MTEB benchmarks above word models like GloVe 300d, while being substantially smaller in size (16MB default model @ 256-dim vs >2GB).

Features of WordLlama include:

Matryoshka Representations: Truncate embedding dimension as needed.
Low Resource Requirements: A simple token lookup with average pooling, enables this to operate fast on CPU.
Binarization: Models trained using the straight through estimator can be packed to small integer arrays for even faster hamming distance calculations. (coming soon)
Numpy-only inference: Lightweight and simple.

For flexibility, WordLlama employs the Matryoshka representation learning training technique. The largest model (1024-dim) can be truncated to 64, 128, 256 or 512. For binary embedding models, we implement straight-through estimators during training. For dense embeddings, 256 dimensions sufficiently captures most of the performance, while for binary embeddings validation accuracy is close to saturation at 512-dimensions (64 bytes packed).

The final weights are saved after weighting, projection and truncation of the entire tokenizer vocabulary. Thus, WordLlama becomes a single embedding matrix (nn.Embedding) that is considerably smaller than the gigabyte-sized llm codebooks we start with. The original tokenizer is still used to preprocess the text into tokens, and the reduced size token embeddings are average pooled. There is very little computation required, and the resulting model sizes range from 16mb to 250mb for the 128k llama3 vocabulary.

It's good option for some nlp-lite tasks. You can train sklearn classifiers on it, perform basic semantic matching, fuzzy deduplication, ranking and clustering. I think it should work well for creating LLM output evaluators, or other preparatory tasks involved in multi-hop or agentic workflows. You can perform your own llm surgery and train your own model on consumer GPUs in a few hours. Because of its fast and portable size, it makes a good "Swiss-Army Knife" utility for exploratory analysis and utility applications.

WordLlama

Cost / License

Platforms

WordLlama

Features

Properties

Features

Tags

WordLlama News & Activities

Recent activities

WordLlama information

Developed by

Licensing

Written in

Alternatives

Supported Languages

GitHub repository

Popular alternatives

What is WordLlama?

Official Links

AppStores & Other Links