posted on 2014-06-26, 13:15authored byMuhammad Muzammal, Rajeev Raman
This paper considers the problem of sequential pattern mining (SPM) in
probabilistic databases. Specifically, we consider SPM in situations where there is uncertainty
in associating an event with a source, model this kind of uncertainty in the probabilistic
database framework and consider the problem of enumerating all sequences
whose expected support is sufficiently large. We give an algorithm based on dynamic
programming to compute the expected support of a sequential pattern. Next, we propose
three algorithms for mining sequential patterns from probabilistic databases. The
first two algorithms are based on the candidate generation framework – one each based
on a breadth-first (similar to GSP) and a depth-first (similar to SPAM) exploration
of the search space. The third one is based on the pattern growth framework (similar
to PrefixSpan). We propose optimizations that mitigate the effects of the expensive
dynamic programming computation step. We give an empirical evaluation of the probabilistic
SPM algorithms and the optimizations, and demonstrate the scalability of the
algorithms in terms of CPU time and the memory usage. We also demonstrate the
effectiveness of the probabilistic SPM framework in extracting meaningful sequences in
the presence of noise.
History
Citation
Knowledge and Information Systems, 2014.
Author affiliation
/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Computer Science
The file associated with this record is embargoed until 12 months after the date of publication. The final published version may be available through the links above.