University of Leicester
Browse

Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport (PILOT)

Download (4.27 MB)
journal contribution
posted on 2024-04-09, 12:28 authored by Mehdi Joodaki, Mina Shaigan, Victor Parra, Roman D Bülow, Christoph Kuppe, David L Hölscher, Mingbo Cheng, James S Nagai, Michaël Goedertier, Nassim Bouteldja, Vladimir Tesar, Jonathan Barratt, Ian SD Roberts, Rosanna Coppo, Rafael Kramann, Peter Boor, Ivan G Costa

Although clinical applications represent the next challenge in single-cell genomics and digital pathology, we still lack computational methods to analyze single-cell or pathomics data to find sample-level trajectories or clusters associated with diseases. This remains challenging as single-cell/pathomics data are multi-scale, i.e., a sample is represented by clusters of cells/structures, and samples cannot be easily compared with each other. Here we propose PatIent Level analysis with Optimal Transport (PILOT). PILOT uses optimal transport to compute the Wasserstein distance between two individual single-cell samples. This allows us to perform unsupervised analysis at the sample level and uncover trajectories or cellular clusters associated with disease progression. We evaluate PILOT and competing approaches in single-cell genomics or pathomics studies involving various human diseases with up to 600 samples/patients and millions of cells or tissue structures. Our results demonstrate that PILOT detects disease-associated samples from large and complex single-cell or pathomics data. Moreover, PILOT provides a statistical approach to find changes in cell populations, gene expression, and tissue structures related to the trajectories or clusters supporting interpretation of predictions.

History

Author affiliation

College of Life Sciences/Cardiovascular Sciences

Version

  • VoR (Version of Record)

Published in

Molecular Systems Biology

Volume

20

Issue

2

Pagination

57 - 74

Publisher

Springer Science and Business Media LLC

issn

1744-4292

eissn

1744-4292

Copyright date

2023

Available date

2024-04-09

Spatial coverage

England

Language

en

Deposited by

Professor Jonathan Barratt

Deposit date

2024-03-28

Data Access Statement

The datasets and computer code produced in this study are available in the following databases: Pre-processed R and H5ad objects used as input in benchmarking and case studies are deposited in zenodo, part 1 and zenodo, part 2. PILOT code, including documentation, tutorials, and scripts for replicating experiments, are found in https://github.com/CostaLab/PILOT and https://pilot.readthedocs.io.

Rights Retention Statement

  • No

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC