We present the first algorithms that perform the LZ78 compression of a text of length n over alphabet [1..σ], whose output is z integers, using only O(z lg σ) bits of main memory. The algorithms read the input text from disk in a single pass, and write the compressed output to disk. The text can also be decompressed within the same main memory usage, which is unprecedented too. The algorithms are based on hashing and, under some simplifying assumptions, run in O(n) expected time. We experimentally verify that our algorithms use 2–9 times less time and/or space than previously implemented LZ78 compressors.
Funding
This collaboration started during the Dagstuhl Seminar 16431, “Computation over Compressed Structured Data”. We also acknowledge the funding from Millennium Nucleus Information and Coordination in Networks ICM/FIC RC130003 (G.N.).
History
Citation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, 10508 LNCS, pp. 38-50
Author affiliation
/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics
Source
International Symposium on String Processing and Information Retrieval SPIRE 2017: String Processing and Information Retrieval
Version
AM (Accepted Manuscript)
Published in
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)