

Apache Nutch
2 likes
Apache Nutch is a highly extensible and scalable open source web crawler software project.
Features
Apache Nutch News & Activities
Highlights All activities
Recent activities
POX added Apache Nutch as alternative to Scraperr
Apache Nutch information
No comments or reviews, maybe you want to be first?
Post comment/reviewWhat is Apache Nutch?
Apache Nutch is a highly extensible and scalable open source web crawler software project.
Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architecture, allowing developers to create plug-ins for media-type parsing, data retrieval, querying and clustering.
The fetcher ("robot" or "web crawler") has been written from scratch specifically for this project.



