University of Leicester
Browse

Adaptive Succinctness

Download (443.97 kB)
journal contribution
posted on 2021-09-06, 14:40 authored by Diego Arroyuelo, Rajeev Raman
<p>Representing a static set of integers S, |S|=n from a finite universe U=[1..u] is a fundamental task in computer science. Our concern is to represent S in small space while supporting the operations of rank and select on S; if S is viewed as its characteristic vector, the problem becomes that of representing a bit-vector, which is arguably the most fundamental building block of succinct data structures. Although there is an information-theoretic lower bound of B(n,u)=lg(un) bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain runs of| ℓ>1 consecutive elements, one that occurs in many practical situations. Let C(n) denote the class of (un) distinct sets of n elements over the universe [1..u]. Let also C(n)g⊂C(n) contain the sets whose n elements are arranged in g≤n runs of ℓi≥1 consecutive elements from U for i=1,…,g, and let C(n)g,r⊂C(n)g contain all sets that consist of g runs, such that r≤g of them have at least 2 elements. This paper yields the following insights and contributions related to rank/select succinct data structures:</p> <p><br></p> <p>We introduce new compressibility measures for sets, including:</p> <p><br></p> <p>B1(g,n,u)=lg|C(n)g|=lg(u−n+1g)+lg(n−1g−1), and</p> <p><br></p> <p>B2(r,g,n,u)=lg|C(n)g,r|=lg(u−n+1g)+lg(n−g−1r−1)+lg(gr),</p> <p><br></p> <p>such that B2(r,g,n,u)≤B1(g,n,u)≤B(n,u).</p> <p><br></p> <p>We give data structures that use space close to bounds B1(g,n,u) and B2(r,g,n,u) and support rank and select in O(1) time.</p> <p><br></p> <p>We provide additional measures involving entropy-coding run lengths and gaps between items, and data structures to support rank and select using space close to these measures.</p>

History

Author affiliation

School of Informatics

Version

  • AM (Accepted Manuscript)

Published in

Algorithmica

Volume

84

Pagination

694-718

Publisher

Springer

issn

0178-4617

Acceptance date

2021-09-01

Copyright date

2021

Available date

2022-10-04

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC