University of Leicester
Browse

Text mining and integration of genetic association information

Download (4.34 MB)
thesis
posted on 2024-07-11, 09:56 authored by Thomas Rowlands

The size of biomedical data available is always increasing, yet their remains significant barriers in place to accessing this data. While research studies are published online, they are often left submitted without machine-readable versions, limiting how quickly and efficiently key information can be extracted and used via automated systems.

With an emphasis on the association between genotype and phenotype, this thesis details the approaches undertaken to extract information from genome-wide association study (GWAS) publications. Using natural language processing techniques, I developed “GWAS Miner”, which utilises ontology terms to annotate and extract data from GWAS publications’ full-text and tables. This enables scalable data curation for database resources such as GWAS Central. Additionally, I developed “GWAS Tagger” for the automated annotation of a GWAS corpus which can be used for training and testing text mining machine learning models.

GWAS Central is one of the largest sources of summary-level GWAS data, providing users with tools for both comparing and visualising GWAS data, along with phenotype ontologies. This thesis also describes how I extended the GWAS Central resource to integrate GWAS summary-level data with mouse disease model data from the International Mouse Phenotyping Consortium (IMPC). Combining model organism data with human GWAS, accessible via novel web interfaces, enables researchers to compare mouse gene knockout experiment data alongside human GWAS data to identify genes of interest for follow-up research and to corroborate existing findings.

History

Supervisor(s)

Tim Beck

Date of award

2024-06-17

Author affiliation

Department of Genetics & Genome Biology

Awarding institution

University of Leicester

Qualification level

  • Doctoral

Qualification name

  • PhD

Language

en

Usage metrics

    University of Leicester Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC