Statistical and Machine Learning Approaches to Risk Prediction

Smith, Hayley

doi:10.25392/leicester.data.24648300.v1

sorry, we can't preview this file

2023SmithHPHD.pdf (9.07 MB)

Statistical and Machine Learning Approaches to Risk Prediction

thesis

posted on 2023-12-13, 09:59 authored by Hayley Smith

In medical research, it is essential models accurately predict probabilities of future events so we can implement preventative measures, predict patient prognosis, and decide effective treatment plans. Currently, conclusions differ about the comparative performance of statistical and machine learning approaches. In this thesis, I compared and evaluated the discrimination and calibration of these approaches, specifically: the Cox model; Flexible Parametric model (FP); Multivariable Fractional Polynomial model (MFP); Random Survival Forest (RSF); and two neural networks. Firstly, I methodologically reviewed simulation studies comparing statistical and machine learning methods for risk prediction. Multiple articles only reported discrimination measures, had poor reporting standards, and simulated from data-generating mechanisms that were biased toward machine learning. This review informed the simulation study design. I then developed a novel approach to simulating survival data, where data is generated from each risk prediction method.

The MFP and RSF models were the most accurate, especially with complex data. As simulation studies use simulated data and require methods to be automated, the methods were then compared using a dataset from VICORI and an iterative, model-fitting workflow used in prognostic research. RSF had the best performance, though including covariate relationships identified in the literature improved the statistical models. Both the simulation study and VICORI analysis highlighted that good discrimination doesn’t necessarily imply good calibration. Lastly, methods must be implemented in software for researchers to use them. Python is a popular programming language but many survival methods are not available. I developed a Python package (asurvivalpackage) that implements key survival methods increasing accessibility. This thesis shows statistical models can perform equivalently to machine learning models, such as RSF, with careful consideration of model implementation. It emphasises how rigorous evaluations of risk prediction models are vital in prognostic research: evaluating both discrimination and calibration, and improving reporting standards is essential.

History

Supervisor(s)

Paul Lambert; Tim Lucas; Michael Sweeting; Michael Crowther

Date of award

2023-09-21

Author affiliation

Department of Health Sciences

Awarding institution

University of Leicester

Qualification level

Doctoral

Qualification name

PhD

Language

en

Usage metrics

Keywords

Statistical Learning Approach Machine Learning approach Risk Prediction Medical Research Prognostic Research thesis Health Sciences

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

sorry, we can't preview this file

Statistical and Machine Learning Approaches to Risk Prediction

History

Supervisor(s)

Date of award

Author affiliation

Awarding institution

Qualification level

Qualification name

Language

Usage metrics

Categories

Keywords

Licence

Exports