University of Leicester
Browse
entropy-22-00296-v2.pdf (4.99 MB)

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph

Download (4.99 MB)
Version 2 2020-05-20, 14:56
Version 1 2020-05-20, 14:54
journal contribution
posted on 2020-05-20, 14:56 authored by Luca Albergante, Evgeny Mirkes, Jonathan Bac, Huidong Chen, Alexis Martin, Louis Faure, Emmanuel Barillot, Luca Pinello, Alexander Gorban, Andrei Zinovyev
Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in various fields such as biology, where it can be used for example with single-cell transcriptomic or epigenomic datasets to infer gene expression dynamics and recover differentiation landscapes.

Funding

This work has been partially supported by the Ministry of Science and Higher Education of the Russian Federation (project No. 14.Y26.31.0022), by Agence Nationale de la Recherche in the program Investissements d’Avenir (project No. ANR-19-P3IA-0001; PRAIRIE 3IA Institute), by European Union’s Horizon 2020 program (grant No. 826121, iPC project), by grant number 2018-182734 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, by ITMO Cancer SysBio program (MOSAIC) and INCa PLBIO program (CALYS, INCA_11692), by the Association Science et Technologie, the Institut de Recherches Internationales Servier and the doctoral school Frontières de l’Innovation en Recherche et Education Programme Bettencourt.

History

Citation

Entropy 2020, 22(3), 296; https://doi.org/10.3390/e22030296

Version

  • VoR (Version of Record)

Published in

Entropy

Volume

22

Issue

3

Pagination

296 - 296

Publisher

MDPI AG

eissn

1099-4300

Acceptance date

2020-03-02

Copyright date

2020

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC