Rapidgzip: Parallelized Decompression of Gzip Files with Support for Fast Random Access.
This repository contains the command line tool rapidgzip, which can be used for parallel decompression of almost any gzip file. Other tools, such as bgzip, can only parallelize decompression of gzip files produced by themselves. rapidgzip works with all files, especially those produced by the usually installed GNU gzip. How this works can be read in the pugz paper or in the rapidgzip paper, which builds upon the former.
The Python module provides a RapidgzipFile class, which can be used to seek inside gzip files without having to decompress them first. Alternatively, you can use this simply as a parallelized gzip decoder as a replacement for Python's builtin gzip module in order to fully utilize all your cores.
The random seeking support is the same as provided by indexed_gzip but further speedups are realized at the cost of higher memory usage thanks to a least-recently-used cache in combination with a parallelized prefetcher.
This repository is a light-weight fork of the indexed_bzip2 repository, in which the main development takes place. This repository was created for visibility reasons and in order to keep indexed_bzip2 and rapidgzip releases separate. It will be updated at least for each release. Issues regarding rapidgzip should be opened here.