University of Leicester
Browse

Adaptive Succinctness

Download (396.8 kB)
conference contribution
posted on 2020-05-15, 13:13 authored by D Arroyuelo, R Raman

Representing a static set of integers S, |𝑆|=𝑛 from a finite universe π‘ˆ=[1..𝑒] is a fundamental task in computer science. Our concern is to represent S in small space while supporting the operations of 𝗋𝖺𝗇𝗄 and π—Œπ–Ύπ—…π–Ύπ–Όπ— on S; if S is viewed as its characteristic vector, the problem becomes that of representing a bit-vector, which is arguably the most fundamental building block of succinct data structures.

Although there is an information-theoretic lower bound of B(𝑛,𝑒)=lg(𝑒𝑛) bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain non-trivial runs of consecutive elements, one that occurs in many practical situations.

Let C𝑛 denote the class of (𝑒𝑛) distinct sets of 𝑛 elements over the universe [1..𝑒] . Let also Cπ‘›π‘”βŠ‚C𝑛 contain the sets whose 𝑛 elements are arranged in 𝑔≀𝑛 runs of ℓ𝑖β‰₯1 consecutive elements from U for 𝑖=1,…,𝑔 , and let C𝑛𝑔,π‘ŸβŠ‚C𝑛𝑔 contain all sets that consist of g runs, such that π‘Ÿβ‰€π‘” of them have at least 2 elements.

We introduce new compressibility measures for sets, including:

L1=lg|C𝑛𝑔|=lg(π‘’βˆ’π‘›+1𝑔)+lg(π‘›βˆ’1π‘”βˆ’1) and

L2=lg|C𝑛𝑔,π‘Ÿ|=lg(π‘’βˆ’π‘›+1𝑔)+lg(π‘›βˆ’π‘”βˆ’1π‘Ÿβˆ’1)+lg(π‘”π‘Ÿ)

We show that L2≀L1≀B(𝑛,𝑒) .

We give data structures that use space close to bounds L1 and L2 and support 𝗋𝖺𝗇𝗄 and π—Œπ–Ύπ—…π–Ύπ–Όπ— in O(1) time.

We provide additional measures involving entropy-coding run lengths and gaps between items, data structures to support these measures, and show experimentally that these approaches are promising for real-world datasets.

Funding

Funded by the Millennium Institute for Foundational Research on Data (IMFD).

History

Citation

Arroyuelo D., Raman R. (2019) Adaptive Succinctness. In: Brisaboa N., Puglisi S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science, vol 11811. Springer, Cham

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics

Source

26th International Symposium on String Processing and Information Retrieval

Version

  • AM (Accepted Manuscript)

Published in

Lecture Notes in Computer Science

Volume

11811

Publisher

Springer Verlag

issn

0302-9743

isbn

978-3-030-32686-9

Acceptance date

2019-07-20

Copyright date

2019

Available date

2019-10-03

Publisher version

https://link.springer.com/chapter/10.1007/978-3-030-32686-9_33

Book series

Lecture Notes in Computer Science

Spatial coverage

Segovia

Temporal coverage: start date

2019-10-07

Temporal coverage: end date

2019-10-09

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC