University of Leicester
Browse

Adaptive Succinctness

Download (443.97 kB)
journal contribution
posted on 2021-09-06, 14:40 authored by Diego Arroyuelo, Rajeev Raman

Representing a static set of integers S, |S|=n from a finite universe U=[1..u] is a fundamental task in computer science. Our concern is to represent S in small space while supporting the operations of rank and select on S; if S is viewed as its characteristic vector, the problem becomes that of representing a bit-vector, which is arguably the most fundamental building block of succinct data structures. Although there is an information-theoretic lower bound of B(n,u)=lg(un) bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain runs of| ℓ>1 consecutive elements, one that occurs in many practical situations. Let C(n) denote the class of (un) distinct sets of n elements over the universe [1..u]. Let also C(n)g⊂C(n) contain the sets whose n elements are arranged in g≤n runs of ℓi≥1 consecutive elements from U for i=1,…,g, and let C(n)g,r⊂C(n)g contain all sets that consist of g runs, such that r≤g of them have at least 2 elements. This paper yields the following insights and contributions related to rank/select succinct data structures:


We introduce new compressibility measures for sets, including:


B1(g,n,u)=lg|C(n)g|=lg(u−n+1g)+lg(n−1g−1), and


B2(r,g,n,u)=lg|C(n)g,r|=lg(u−n+1g)+lg(n−g−1r−1)+lg(gr),


such that B2(r,g,n,u)≤B1(g,n,u)≤B(n,u).


We give data structures that use space close to bounds B1(g,n,u) and B2(r,g,n,u) and support rank and select in O(1) time.


We provide additional measures involving entropy-coding run lengths and gaps between items, and data structures to support rank and select using space close to these measures.

History

Author affiliation

School of Informatics

Version

  • AM (Accepted Manuscript)

Published in

Algorithmica

Volume

84

Pagination

694-718

Publisher

Springer

issn

0178-4617

Acceptance date

2021-09-01

Copyright date

2021

Available date

2022-10-04

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC