StormCrawler

2 likes

StormCrawler is an open source SDK for building distributed web crawlers with Apache Storm. The project is under Apache license v2 and consists of a collection of reusable resources and components, written mostly in Java.

Cost / License

Free
Open Source (Apache-2.0)

Origin

United Kingdom

Platforms

Mac
Windows
Linux

StormCrawler alternatives

2likes

0comments

11alternatives

0articles

Features

No features, maybe you want to suggest one?

StormCrawler News & Activities

Highlights All activities

Recent activities

harshvz added StormCrawler as alternative to @harshvz/crawler
6 months ago

StormCrawler information

Developed by
DigitalPebble Ltd
Licensing
Open Source (Apache-2.0) and Free product.
Written in
Java
Alternatives
11 alternatives listed
Supported Languages
- English

GitHub repository

982 Stars
282 Forks
21 Open Issues
Updated Jul 13, 2026

View on GitHub

Popular alternatives

View all

StormCrawler was added to AlternativeTo by jnioche on Sep 28, 2017 and this page was last updated Sep 28, 2017.

No comments or reviews, maybe you want to be first?

What is StormCrawler?

The aim of StormCrawler is to help build web crawlers that are:

scalable resilient low latency easy to extend polite yet efficient

StormCrawler is a library and collection of resources that developers can leverage to build their own crawlers. The good news is that doing so can be pretty straightforward. Often, all you'll have to do will be to declare storm-crawler as a Maven dependency, write your own Topology class (tip: you can extend ConfigurableTopology), reuse the components provided by the project and maybe write a couple of custom ones for your own secret sauce. A bit of tweaking to the Configuration and off you go!

Apart from the core components, we provide some external resources that you can reuse in your project, like for instance our spout and bolts for ElasticSearch or a ParserBolt which uses Apache Tika to parse various document formats.

StormCrawler is perfectly suited to use cases where the URL to fetch and parse come as streams but is also an appropriate solution for large scale recursive crawls, particularly where low latency is required. The project is used in production by several companies and is actively developed and maintained.

StormCrawler

Cost / License

Origin

Platforms

StormCrawler

Features

Tags

StormCrawler News & Activities

Recent activities

StormCrawler information

Developed by

Licensing

Written in

Alternatives

Supported Languages

GitHub repository

Popular alternatives

What is StormCrawler?

Official Links

AppStores & Other Links

Social Networks