Representing a static set of integers S, |𝑆|=𝑛 from a finite universe 𝑈=[1..𝑢] is a fundamental task in computer science. Our concern is to represent S in small space while supporting the operations of 𝗋𝖺𝗇𝗄 and 𝗌𝖾𝗅𝖾𝖼𝗍 on S; if S is viewed as its characteristic vector, the problem becomes that of representing a bit-vector, which is arguably the most fundamental building block of succinct data structures.
Although there is an information-theoretic lower bound of B(𝑛,𝑢)=lg(𝑢𝑛) bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain non-trivial runs of consecutive elements, one that occurs in many practical situations.
Let C𝑛 denote the class of (𝑢𝑛) distinct sets of 𝑛 elements over the universe [1..𝑢] . Let also C𝑛𝑔⊂C𝑛 contain the sets whose 𝑛 elements are arranged in 𝑔≤𝑛 runs of ℓ𝑖≥1 consecutive elements from U for 𝑖=1,…,𝑔 , and let C𝑛𝑔,𝑟⊂C𝑛𝑔 contain all sets that consist of g runs, such that 𝑟≤𝑔 of them have at least 2 elements.
We introduce new compressibility measures for sets, including:
We show that L2≤L1≤B(𝑛,𝑢) .
We give data structures that use space close to bounds L1 and L2 and support 𝗋𝖺𝗇𝗄 and 𝗌𝖾𝗅𝖾𝖼𝗍 in O(1) time.
We provide additional measures involving entropy-coding run lengths and gaps between items, data structures to support these measures, and show experimentally that these approaches are promising for real-world datasets.
Funded by the Millennium Institute for Foundational Research on Data (IMFD).
CitationArroyuelo D., Raman R. (2019) Adaptive Succinctness. In: Brisaboa N., Puglisi S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science, vol 11811. Springer, Cham
Author affiliation/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics
Source26th International Symposium on String Processing and Information Retrieval
- AM (Accepted Manuscript)