@harshvz/crawler

2 likes

A flexible web crawler and scraping tool using Playwright, supporting both BFS and DFS crawling strategies with screenshot capture and structured output. Installable via npm and usable both as a CLI and programmatically.

Cost / License

Free
Open Source (Apache-2.0)

Origin

India

Platforms

Mac
Windows
Linux

Alternatives

2likes

0comments

28alternatives

0articles

Features

Browser Automation
Crawler

@harshvz/crawler News & Activities

Highlights All activities

Recent activities

niksavc liked @harshvz/crawler
about 2 months ago
harshvz updated @harshvz/crawler
about 2 months ago
harshvz liked @harshvz/crawler
about 2 months ago
harshvz added @harshvz/crawler
about 2 months ago
harshvz added @harshvz/crawler as alternative to Scrapy, Apify, Scrapfly and UI.Vision RPA + 26 similar activities
about 2 months ago

@harshvz/crawler information

Developed by
HarshVz
Licensing
Open Source (Apache-2.0) and Free product.
Written in
TypeScript
Alternatives
28 alternatives listed
Supported Languages
- English

GitHub repository

5 Stars
0 Forks
3 Open Issues
Updated Jan 18, 2026

View on GitHub

Popular alternatives

View all

@harshvz/crawler was added to AlternativeTo by Harsh on Jan 18, 2026 and this page was last updated Jan 18, 2026.

No comments or reviews, maybe you want to be first?

What is @harshvz/crawler?

crawler is a Playwright-based web crawler designed to turn websites into reusable knowledge artifacts.

Unlike traditional scrapers that focus on extracting isolated data fields, this tool focuses on capturing meaning-bearing content from real, JavaScript-rendered pages and preserving it in a form suitable for documentation, internal knowledge bases, and AI/LLM workflows.

The crawler navigates websites using BFS or DFS strategies, renders each page in a real browser, and extracts core semantic elements such as metadata, headings (H1–H6), paragraphs, and inline text. The extracted content is stored as Markdown files, alongside full-page screenshots, providing both textual knowledge and visual ground truth for every crawled page.

The project is intentionally opinionated and minimal:

It prioritizes content understanding over raw scraping speed

It captures human-readable, context-preserving text

It produces outputs that are immediately usable by humans and machines

At its core, crawler is built as a knowledge ingestion layer — a foundation for turning websites into structured documentation, searchable knowledge bases, or LLM-ready corpora, while remaining fully local, open-source, and developer-controlled.

As the project evolves, the focus is on making extraction more controllable and deterministic, allowing users to define what content is captured and how it is organized — without introducing black-box behavior or external dependencies.

@harshvz/crawler

Cost / License

Origin

Platforms

@harshvz/crawler

Features

Tags

@harshvz/crawler News & Activities

Recent activities

@harshvz/crawler information

Developed by

Licensing

Written in

Alternatives

Supported Languages

GitHub repository

Popular alternatives

What is @harshvz/crawler?

@harshvz/crawler Videos

Official Links

AppStores & Other Links

Social Networks