University of Leicester
Browse
DOCUMENT
AJHG-D-17-00361_R3 (1).pdf (3.57 MB)
DATASET
Table_S7.xlsx (15.6 kB)
DATASET
Table_S6.xlsx (22.73 kB)
DATASET
Table_S5.xlsx (12.13 kB)
DATASET
Table_S4.xlsx (13.9 kB)
DATASET
Table_S3.xlsx (10.3 kB)
DATASET
Table_S2.xlsx (19.53 kB)
DATASET
Table_S1.xlsx (42.89 kB)
DOCUMENT
Rivolta_Supplementary_Figures.pdf (1.04 MB)
DATASET
Table_S8.xlsx (11.06 kB)
1/0
10 files

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders.

journal contribution
posted on 2017-11-21, 11:55 authored by Mathieu Quinodoz, Beryl Royer-Bertrand, Katarina Cisarova, Silvio Alessandro Di Gioia, Andrea Superti-Furga, Carlo Rivolta
In contrast to recessive conditions with biallelic inheritance, identification of dominant (monoallelic) mutations for Mendelian disorders is more difficult, because of the abundance of benign heterozygous variants that act as massive background noise (typically, in a 400:1 excess ratio). To reduce this overflow of false positives in next-generation sequencing (NGS) screens, we developed DOMINO, a tool assessing the likelihood for a gene to harbor dominant changes. Unlike commonly-used predictors of pathogenicity, DOMINO takes into consideration features that are the properties of genes, rather than of variants. It uses a machine-learning approach to extract discriminant information from a broad array of features (N = 432), including: genomic data, intra-, and interspecies conservation, gene expression, protein-protein interactions, protein structure, etc. DOMINO's iterative architecture includes a training process on 985 genes with well-established inheritance patterns for Mendelian conditions, and repeated cross-validation that optimizes its discriminant power. When validated on 99 newly-discovered genes with pathogenic mutations, the algorithm displays an excellent final performance, with an area under the curve (AUC) of 0.92. Furthermore, unsupervised analysis by DOMINO of real sets of NGS data from individuals with intellectual disability or epilepsy correctly recognizes known genes and predicts 9 new candidates, with very high confidence. In summary, DOMINO is a robust and reliable tool that can infer dominance of candidate genes with high sensitivity and specificity, making it a useful complement to any NGS pipeline dealing with the analysis of the morbid human genome.

Funding

This work was supported by the Swiss National Science Foundation (grant # 156260, to C.R.) and by the PhD Fellowships in Life Science of the University of Lausanne (to M.Q.).

History

Citation

American Journal of Human Genetics, 2017, 101 (4), pp. 623-629

Author affiliation

/Organisation/COLLEGE OF LIFE SCIENCES/MBSP Non-Medical Departments/Department of Genetics

Version

  • AM (Accepted Manuscript)

Published in

American Journal of Human Genetics

Publisher

Elsevier (Cell Press)

issn

0002-9297

eissn

1537-6605

Acceptance date

2017-09-01

Copyright date

2017

Available date

2018-04-05

Publisher version

http://www.sciencedirect.com/science/article/pii/S0002929717303683?via=ihub

Notes

Supplemental Information includes two figures and eight tables and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2017.09.001.;The file associated with this record is under embargo until 6 months after publication, in accordance with the publisher's self-archiving policy. The full text may be available through the publisher links provided above.

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC