Heritrix icon
Heritrix icon

Heritrix

 5 likes

Open-source, extensible web crawler designed for large-scale, archival-quality web archiving, preserves digital artifacts, supports modular plugins, distributed crawling, detailed monitoring, scheduling, and exports data in standardized formats for preservation.

Heritrix screenshot 1

License model

  • FreeOpen Source

Country of Origin

  • US flagUnited States

Platforms

  • Mac
  • Windows
  • Linux
  No rating
5likes
0comments
0news articles

Features

Suggest and vote on features
  1.  WARC Output

 Tags

Heritrix News & Activities

Highlights All activities

Recent activities

Show all activities

Heritrix information

  • Developed by

    US flagInternet Archive
  • Licensing

    Open Source and Free product.
  • Written in

  • Alternatives

    16 alternatives listed
  • Supported Languages

    • English

GitHub repository

  •  2,987 Stars
  •  762 Forks
  •  36 Open Issues
  •   Updated Jun 12, 2025 
View on GitHub

Our users have written 0 comments and reviews about Heritrix, and it has gotten 5 likes

Heritrix was added to AlternativeTo by sanalbilgikosesi on Dec 28, 2015 and this page was last updated May 21, 2025.
No comments or reviews, maybe you want to be first?
Post comment/review

What is Heritrix?

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Heritrix (sometimes spelled heretrix, or misspelled or mis-said as heratrix/heritix/ heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.

Official Links