LZ78 compression in low main memory space

Arroyuelo, Diego; Cánovas, Rodrigo; Navarro, Gonzalo; Raman, Rajeev

LZ78 compression in low main memory space

conference contribution

posted on 2018-05-02, 16:31 authored by Diego Arroyuelo, Rodrigo Cánovas, Gonzalo Navarro, Rajeev Raman

We present the first algorithms that perform the LZ78 compression of a text of length n over alphabet [1..σ], whose output is z integers, using only O(z lg σ) bits of main memory. The algorithms read the input text from disk in a single pass, and write the compressed output to disk. The text can also be decompressed within the same main memory usage, which is unprecedented too. The algorithms are based on hashing and, under some simplifying assumptions, run in O(n) expected time. We experimentally verify that our algorithms use 2–9 times less time and/or space than previously implemented LZ78 compressors.

Funding

This collaboration started during the Dagstuhl Seminar 16431, “Computation over Compressed Structured Data”. We also acknowledge the funding from Millennium Nucleus Information and Coordination in Networks ICM/FIC RC130003 (G.N.).

History

Citation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, 10508 LNCS, pp. 38-50

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics

Source

International Symposium on String Processing and Information Retrieval SPIRE 2017: String Processing and Information Retrieval

Version

AM (Accepted Manuscript)

Published in

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Publisher

Springer Verlag (Germany)

issn

0302-9743

eissn

1611-3349

isbn

9783319674278

Copyright date

2017

Available date

2018-05-02

Publisher DOI

https://doi.org/10.1007/978-3-319-67428-5_4

Publisher version

https://link.springer.com/chapter/10.1007/978-3-319-67428-5_4

Language

en

Administrator link

https://leicester.figshare.com/account/articles/10219076

Usage metrics

Keywords

IR content

LZ78 compression in low main memory space

Funding

This collaboration started during the Dagstuhl Seminar 16431, “Computation over Compressed Structured Data”. We also acknowledge the funding from Millennium Nucleus Information and Coordination in Networks ICM/FIC RC130003 (G.N.).

History

Citation

Author affiliation

Source

Version

Published in

Publisher

issn

eissn

isbn

Copyright date

Available date

Publisher DOI

Publisher version

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports