Adaptive Succinctness
Representing a static set of integers S, |S|=n from a finite universe U=[1..u] is a fundamental task in computer science. Our concern is to represent S in small space while supporting the operations of rank and select on S; if S is viewed as its characteristic vector, the problem becomes that of representing a bit-vector, which is arguably the most fundamental building block of succinct data structures. Although there is an information-theoretic lower bound of B(n,u)=lg(un) bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain runs of| ℓ>1 consecutive elements, one that occurs in many practical situations. Let C(n) denote the class of (un) distinct sets of n elements over the universe [1..u]. Let also C(n)g⊂C(n) contain the sets whose n elements are arranged in g≤n runs of ℓi≥1 consecutive elements from U for i=1,…,g, and let C(n)g,r⊂C(n)g contain all sets that consist of g runs, such that r≤g of them have at least 2 elements. This paper yields the following insights and contributions related to rank/select succinct data structures:
We introduce new compressibility measures for sets, including:
B1(g,n,u)=lg|C(n)g|=lg(u−n+1g)+lg(n−1g−1), and
B2(r,g,n,u)=lg|C(n)g,r|=lg(u−n+1g)+lg(n−g−1r−1)+lg(gr),
such that B2(r,g,n,u)≤B1(g,n,u)≤B(n,u).
We give data structures that use space close to bounds B1(g,n,u) and B2(r,g,n,u) and support rank and select in O(1) time.
We provide additional measures involving entropy-coding run lengths and gaps between items, and data structures to support rank and select using space close to these measures.
History
Author affiliation
School of InformaticsVersion
- AM (Accepted Manuscript)