University of Leicester
Browse

Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity

Download (4.21 MB)
journal contribution
posted on 2022-07-13, 08:49 authored by Mathieu Quinodoz, Virginie G Peter, Katarina Cisarova, Beryl Royer-Bertrand, Peter D Stenson, David N Cooper, Sheila Unger, Andrea Superti-Furga, Carlo Rivolta
We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance.

Funding

Swiss National Science Foundation (grants #176097 and #204285)

QIAGEN Inc

History

Author affiliation

Department of Genetics and Genome Biology, University of Leicester

Version

  • VoR (Version of Record)

Published in

American Journal of Human Genetics

Volume

109

Issue

3

Pagination

457 - 470

Publisher

CELL PRESS

issn

0002-9297

eissn

1537-6605

Acceptance date

2022-01-11

Copyright date

2022

Available date

2022-07-13

Spatial coverage

United States

Language

English

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC