Adaptive Succinctness
Representing a static set of integers S, |π|=π from a finite universe π=[1..π’] is a fundamental task in computer science. Our concern is to represent S in small space while supporting the operations of ππΊππ and ππΎπ πΎπΌπ on S; if S is viewed as its characteristic vector, the problem becomes that of representing a bit-vector, which is arguably the most fundamental building block of succinct data structures.
Although there is an information-theoretic lower bound of B(π,π’)=lg(π’π) bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain non-trivial runs of consecutive elements, one that occurs in many practical situations.
Let Cπ denote the class of (π’π) distinct sets of π elements over the universe [1..π’] . Let also CππβCπ contain the sets whose π elements are arranged in πβ€π runs of βπβ₯1 consecutive elements from U for π=1,β¦,π , and let Cππ,πβCππ contain all sets that consist of g runs, such that πβ€π of them have at least 2 elements.
We introduce new compressibility measures for sets, including:
L1=lg|Cππ|=lg(π’βπ+1π)+lg(πβ1πβ1) and
L2=lg|Cππ,π|=lg(π’βπ+1π)+lg(πβπβ1πβ1)+lg(ππ)
We show that L2β€L1β€B(π,π’) .
We give data structures that use space close to bounds L1 and L2 and support ππΊππ and ππΎπ πΎπΌπ in O(1) time.
We provide additional measures involving entropy-coding run lengths and gaps between items, data structures to support these measures, and show experimentally that these approaches are promising for real-world datasets.
Funding
Funded by the Millennium Institute for Foundational Research on Data (IMFD).
History
Citation
Arroyuelo D., Raman R. (2019) Adaptive Succinctness. In: Brisaboa N., Puglisi S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science, vol 11811. Springer, ChamAuthor affiliation
/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of InformaticsSource
26th International Symposium on String Processing and Information RetrievalVersion
- AM (Accepted Manuscript)