Tarsier

1 like

If you've tried using an LLM to automate web interactions, you've probably run into questions like:

Cost / License

Free
Open Source (MIT)

Platforms

Self-Hosted
Python

Tarsier alternatives

1like

0comments

3alternatives

0articles

Features

Ad-free
OCR
Python-based
AI-Powered

Tarsier News & Activities

Highlights All activities

Recent activities

POX added Tarsier as alternative to Smooth
about 1 month ago
Stimmenhotel added Tarsier as alternative to Agentic Browser
9 months ago

Tarsier information

Developed by
Reworkd AI
Licensing
Open Source (MIT) and Free product.
Alternatives
3 alternatives listed
Supported Languages
- English

AlternativeTo Categories

Development, Office & Productivity

GitHub repository

1,756 Stars
121 Forks
17 Open Issues
Updated Nov 25, 2024

View on GitHub

Popular alternatives

View all

Tarsier was added to AlternativeTo by Paul on May 16, 2024 and this page was last updated May 16, 2024.

No comments or reviews, maybe you want to be first?

What is Tarsier?

If you've tried using an LLM to automate web interactions, you've probably run into questions like:

How should you feed the webpage to an LLM? (e.g. HTML, Accessibility Tree, Screenshot)
How do you map LLM responses back to web elements?
How can you inform a text-only LLM about the page's visual structure?

At Reworkd, we iterated on all these problems across tens of thousands of real web tasks to build a powerful perception system for web agents... Tarsier! In the video below, we use Tarsier to provide webpage perception for a minimalistic GPT-4 LangChain web agent.

How does it work?

Tarsier visually tags interactable elements on a page via brackets + an ID e.g. [23]. In doing this, we provide a mapping between elements and IDs for an LLM to take actions upon (e.g. CLICK [23]). We define interactable elements as buttons, links, or input fields that are visible on the page; Tarsier can also tag all textual elements if you pass tag_text_elements=True.

Furthermore, we've developed an OCR algorithm to convert a page screenshot into a whitespace-structured string (almost like ASCII art) that an LLM even without vision can understand. Since current vision-language models still lack fine-grained representations needed for web interaction tasks, this is critical. On our internal benchmarks, unimodal GPT-4 + Tarsier-Text beats GPT-4V + Tarsier-Screenshot by 10-20%!

Tarsier

Cost / License

Platforms

Tarsier

Features

Tags

Tarsier News & Activities

Recent activities

Tarsier information

Developed by

Licensing

Alternatives

Supported Languages

AlternativeTo Categories

GitHub repository

Popular alternatives

What is Tarsier?

Official Links

AppStores & Other Links

Social Networks