University of Leicester
Browse
2021KtistakisCR_PhD.pdf (4.81 MB)

Engineering Lossless Sequence Prediction with Compact Data Structures

Download (4.81 MB)
thesis
posted on 2022-01-31, 14:19 authored by Rafael Ktistakis
Sequences of symbols can be used to represent data in many domains such as text documents, activity logs, customer transactions and website clickstreams. Sequence prediction is a popular task, which consists of predicting the next symbol from a sequence of symbols, given a set of training sequences (under identical symbol set). Although numerous prediction models have been proposed, many have a low accuracy because they are lossy models (they discard information from training sequences to build the model), while lossless models are often more accurate but typically consume a large amount of memory. This thesis addresses the challenges of lossless sequence prediction approaches by engineering an existing state-of-the-art lossless sequence prediction algorithm through space efficient data structures. Moreover, during this thesis, we propose a novel and lossless sequence prediction model that overcomes most of the challenges that usually make a lossless approach suffer. We utilise succinct data structures to compactly represent and efficiently access training sequences for prediction. Based on our experimental evaluations, our lossless SCPT and SUBSEQ prediction algorithms, achieve a very low and consistent memory consumption while maintaining a competitive execution performance. Moreover, with SUBSEQ, we demonstrate an excellent accuracy when compared to eight state-of-the-art prediction algorithms on seven real-life datasets. Finally, we further examine the significance of lossless approaches in the sequence prediction domain, and we present a new ensemble approach that blends lossy and lossless sequence prediction models for a much more improved accuracy performance.

History

Supervisor(s)

Rajeev Raman

Date of award

2021-11-10

Author affiliation

Faculty of Informatics

Awarding institution

University of Leicester

Qualification level

  • Doctoral

Qualification name

  • PhD

Language

en

Usage metrics

    University of Leicester Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC