University of Leicester
Browse

A Graph-Based Architecture For Efficient Genome Data Representation And Variant Transformation

Download (7.46 MB)
thesis
posted on 2024-11-22, 11:30 authored by Sanna Aizad

Processing massive and complex genomic data sets is increasingly time consuming and challenging. The scale and complexity of this data, which includes diverse file types and heterogeneous data structures, present significant challenges for storage, processing, and retrieval. Traditional models struggle with the high dimensionality and iterative nature of genomic data analysis. This research introduces a graph-based architecture to represent human genome variations, addressing key challenges in the current genomic data landscape, including data heterogeneity, volume, and structural complexity. This thesis introduces a graph-based data model to represent and process human genome variations efficiently. By mapping both the reference genome and genome variant data (from VCF files) into a unified graph model, a graph-based architecture is proposed that enhances data accessibility, scalability, and speed in genome variant analysis. The research formalises a property graph model where genomic variations, such as substitutions, insertions, and deletions, are mapped as nodes connected to the reference genome. A graph-based variant normalization algorithm is presented that ensures consistent variant representation from different VCF data sources. A graph database is employed for fast data retrieval, with response times reduced from minutes to milliseconds. This approach provides a scalable and adaptable solution to genomic data processing, facilitating more efficient research and enabling new opportunities in personalised and precision medicine.

History

Supervisor(s)

Ashiq Anjum; Lu Liu

Date of award

2024-11-04

Author affiliation

School of Computing and Mathematical Sciences

Awarding institution

University of Leicester

Qualification level

  • Doctoral

Qualification name

  • PhD

Language

en

Usage metrics

    University of Leicester Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC