posted on 2025-07-08, 08:58authored byGregory S. Warren
<p dir="ltr">Technological solutions that facilitate the generation and analysis of biomedical data are generally not well matched by advances that would optimise the sharing and interoperability of those data. This problem is compounded in fields concerned with sensitive data, such as medicine and genetics, due to the ongoing emergence of legislation worldwide pertaining to data security and individual privacy. Against this backdrop, this PhD project has examined how Graph Technology and Semantic Similarity technologies could be utilised to connect distinct data types in powerful and novel ways. Specifically, graph models were generated that encompass phenotypic and genetic data, and then these resources were augmented with a layer of concept similarity information, to improve the utility of the data. Work was then done to develop an ecosystem of supporting infrastructure to support the creation of such graph models, and to allow complex querying thereof. Efficacy and optimisation steps were performed to ensure that the Graph Model and its supporting query components were being performed at the highest possible level. Finally, to validate these capabilities, the solutions were integrated into several real-world data discovery activities, not least the Cafe Variome technology program and the large-scale Solve-RD project. Overall, this project has advanced the field of Data Discovery by providing new ways to handle big data and improve data connectivity, based on new methodology, tooling and approaches.</p>