posted on 2020-05-18, 15:27authored byR Ktistakis, P Fournier-Viger, S Puglisi, R Raman
Sequences of symbols can be used to represent data in many domains such as text documents, activity logs, customer transactions and website click-streams. Sequence prediction is a popular task, which consists of predicting the next symbol of a sequence, given a set of training sequences. Although numerous prediction models have been proposed, many have a low accuracy because they are lossy models (they discard information from training sequences to build the model), while lossless models are often more accurate but typically consume a large amount of memory. This paper addresses these issues by proposing a novel sequence prediction model named SuBSeq that is lossless and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction. An experimental evaluation shows that SuBSeq has a very low memory consumption and excellent accuracy when compared to eight state-of-the-art predictors on seven real datasets.
History
Citation
Ktistakis R., Fournier-Viger P., Puglisi S.J., Raman R. (2019) Succinct BWT-Based Sequence Prediction. In: Hartmann S., Küng J., Chakravarthy S., Anderst-Kotsis G., Tjoa A., Khalil I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science, vol 11707. Springer, Cham
Author affiliation
/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics
Source
30th International Conference on Database and Expert Systems Applications - DEXA 2019