University of Leicester
Browse
A Review of Regression and Classification Techniques for Analysis of Common and Rare Variants and Gene-Environmental Factors.pdf (596.42 kB)

A review of regression and classification techniques for analysis of common and rare variants and gene-environmental factors

Download (596.42 kB)
journal contribution
posted on 2024-04-19, 10:41 authored by A Miller, J Panneerselvam, Lu LiuLu Liu

Statistical techniques incorporated with machine-learning algorithms in unison with gene-environment interaction are giving unparalleled understanding of complex diseases. Accurate analysis and intricate capturing of common, rare, and low MAF (Minor Allele Frequency) variants alongside gene-environmental interaction is pivotal whilst concluding reliable and accurate classification of complex diseases. Various complex diseases including genres of diabetes Type 1 and Type 2 alongside the vastly under-researched Lada (Latent Autoimmune Diabetes in Adults) diabetes require further investigation alongside significant machine learning research to gain a deeper understanding of the disease complexities. Despite existing efforts, an ideal combination of statistical techniques with optimal machine-learning algorithms that can accurately capture and model the gene-environment interaction is lacking. Intentionally exploring future and simultaneously exploiting modern-day computational methods in genomic analysis, this paper profoundly investigates both the future and present interaction of statistical analysis techniques and machine-learning algorithms and Ensembles with gene-environmental factors. In this context, this paper firstly presents a conceptual understanding of genomic conventions; secondly, conducts potential future machine learning algorithms alongside an extensive analysis of a range of classification, regression and Ensemble techniques along with exhibiting their imperative relationship and roles in investigating and classifying common, rare variants and a wide array of gene-environmental factors; and thirdly, utilisation of statistical techniques in Genome Wide Association Studies is scrutinised whilst analysing common, rare and MAF variants. As an important contribution, this paper identifies efficient machine-learning algorithms alongside Ensemble models and future potential analysis techniques and exhibits their inherent characteristics that can enhance the reliability and accuracy of the gene-environment classification analysis.

History

Author affiliation

College of Science & Engineering/Comp' & Math' Sciences

Version

  • AM (Accepted Manuscript)

Published in

Neurocomputing

Volume

489

Pagination

466 - 485

Publisher

Elsevier BV

issn

0925-2312

eissn

1872-8286

Copyright date

2022

Available date

2024-04-19

Language

en

Deposited by

Professor Lu Liu

Deposit date

2024-04-18

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC