University of Leicester
Browse
2020ASANKALAZY.pdf (5.23 MB)

Pattern Mining from Probabilistic Databases with Dependencies

Download (5.23 MB)
thesis
posted on 2020-07-21, 09:44 authored by Yasemin Asan Kalaz
In recent years, many emerging technologies, such as radio-frequency identification (RFID) networks and wireless sensor networks have produced a large amount of uncertain data. This brings great attention to uncertain data. As the pattern mining problem is studied a lot in certain data, it is also quite an important problem in uncertain data. Probabilistic databases are a commonly used framework to model uncertain databases. There are many studies on uncertain databases, however, most of them use the independence assumption. In this thesis, first, we propose a correlated tuple model that enables us to define dependencies between tuples for tuple level uncertain databases. As an improvement to this model, we define a general model that can capture existing dependencies in uncertain dependent databases. However, finding the support of an item set on such a model is an NP-complete problem. Instead, we propose a restricted version of this model. We also define a dynamic program to efficiently find frequent itemsets. Finally, we propose a pattern matching problem on transcription factor binding profiles. We generate uncertain dependent sequence data, to which we apply a mining algorithm to find frequent sub-sequences. After frequent sub-sequences have been found for each motif, whose family is already known, we use the Jaccard index to compare them with each other. Then, we apply the distance measure to the Jaccard similarity values to identify the right family for each motif. We validated our solutions through extensive experiments and discuss potential future research directions for mining patterns over dependent uncertain databases.

History

Supervisor(s)

Rajeev Raman

Date of award

2020-06-04

Author affiliation

Department of Informatics

Awarding institution

University of Leicester

Qualification level

  • Doctoral

Qualification name

  • PhD

Language

en

Usage metrics

    University of Leicester Theses

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC