A review of regression and classification techniques for analysis of common and rare variants and gene-environmental factors
Statistical techniques incorporated with machine-learning algorithms in unison with gene-environment interaction are giving unparalleled understanding of complex diseases. Accurate analysis and intricate capturing of common, rare, and low MAF (Minor Allele Frequency) variants alongside gene-environmental interaction is pivotal whilst concluding reliable and accurate classification of complex diseases. Various complex diseases including genres of diabetes Type 1 and Type 2 alongside the vastly under-researched Lada (Latent Autoimmune Diabetes in Adults) diabetes require further investigation alongside significant machine learning research to gain a deeper understanding of the disease complexities. Despite existing efforts, an ideal combination of statistical techniques with optimal machine-learning algorithms that can accurately capture and model the gene-environment interaction is lacking. Intentionally exploring future and simultaneously exploiting modern-day computational methods in genomic analysis, this paper profoundly investigates both the future and present interaction of statistical analysis techniques and machine-learning algorithms and Ensembles with gene-environmental factors. In this context, this paper firstly presents a conceptual understanding of genomic conventions; secondly, conducts potential future machine learning algorithms alongside an extensive analysis of a range of classification, regression and Ensemble techniques along with exhibiting their imperative relationship and roles in investigating and classifying common, rare variants and a wide array of gene-environmental factors; and thirdly, utilisation of statistical techniques in Genome Wide Association Studies is scrutinised whilst analysing common, rare and MAF variants. As an important contribution, this paper identifies efficient machine-learning algorithms alongside Ensemble models and future potential analysis techniques and exhibits their inherent characteristics that can enhance the reliability and accuracy of the gene-environment classification analysis.
History
Author affiliation
College of Science & Engineering/Comp' & Math' SciencesVersion
- AM (Accepted Manuscript)