Over the past decade, there has been an ever growing interest in genome-wide association studies (GWAS). The role of GWAS is to discover associations between genetic variants; commonly Single Nucleotide Polymorphisms (SNPs) and complex diseases. Due to the ever increasing number of SNPs in GWAS, the commonly used association analyses tend to be univariate models rather than multivariate models. These methods are therefore unable to account for the correlation between SNPs; known as Linkage Disequilibrium (LD).
Penalised regression methods have been suggested as an alternative method in GWAS, specifically the Least Absolute Shrinkage and Selection Operator (LASSO). This method has the ability to both shrink regression coefficients and perform variable selection. In this thesis, the use of the LASSO in both single and multi-cohort GWAS is examined. In the context of the single cohort, the LASSO is applied to the GRAPHIC study in an attempt to discover novel associations with Low-density Lipoprotein. This thesis will also address some of the problems with the LASSO such the tuning parameter selection method that should be used for SNP selection and the need for pruning to reduce the dimensionality of the data in order to fit LASSO models. The literature suggests that a pruning or pre-screening method is required to fit LASSO models in GWAS due to the high computational burden of fitting such a model, yet there is little work to address how the dataset should be pruned. A SNP pruning package in R called prune is developed and is utilised in a simulation study to determine which pruning method should be used. The role of the LASSO in multi-cohort studies is also considered specifically in integrative analyses. A new penalised regression method, the Integrative LASSO, is proposed and developed which uses a combination of LASSO, ridge regression and fused LASSO penalties and tested against some of the current methods in the literature in a simulation study.