Forced vital capacity trajectories in patients with idiopathic pulmonary fibrosis: a secondary analysis of a multicentre, prospective, observational cohort
Background: Idiopathic pulmonary fibrosis is a progressive fibrotic lung disease with a variable clinical trajectory. Decline in forced vital capacity (FVC) is the main indicator of progression; however, missingness prevents long-term analysis of patterns in lung function. We aimed to identify distinct clusters of lung function trajectory among patients with idiopathic pulmonary fibrosis using machine learning techniques.
Methods: We did a secondary analysis of longitudinal data on FVC collected from a cohort of patients with idiopathic pulmonary fibrosis from the PROFILE study; a multicentre, prospective, observational cohort study. We evaluated the imputation performance of conventional and machine learning techniques to impute missing data and then analysed the fully imputed dataset by unsupervised clustering using self-organising maps. We compared anthropometric features, genomic associations, serum biomarkers, and clinical outcomes between clusters. We also performed a replication of the analysis on data from a cohort of patients with idiopathic pulmonary fibrosis from an independent dataset, obtained from the Chicago Consortium.
Findings: 415 (71%) of 581 participants recruited into the PROFILE study were eligible for further analysis. An unsupervised machine learning algorithm had the lowest imputation error among tested methods, and self-organising maps identified four distinct clusters (1–4), which was confirmed by sensitivity analysis. Cluster 1 comprised 140 (34%) participants and was associated with a disease trajectory showing a linear decline in FVC over 3 years. Cluster 2 comprised 100 (24%) participants and was associated with a trajectory showing an initial improvement in FVC before subsequently decreasing. Cluster 3 comprised 113 (27%) participants and was associated with a trajectory showing an initial decline in FVC before subsequent stabilisation. Cluster 4 comprised 62 (15%) participants and was associated with a trajectory showing stable lung function. Median survival was shortest in cluster 1 (2·87 years [IQR 2·29–3·40]) and cluster 3 (2·23 years [1·75–3·84]), followed by cluster 2 (4·74 years [3·96–5·73]), and was longest in cluster 4 (5·56 years [5·18–6·62]). Baseline FEV1 to FVC ratio and concentrations of the biomarker SP-D were significantly higher in clusters 1 and 3. Similar lung function clusters with some shared anthropometric features were identified in the replication cohort.Interpretation: Using a data-driven unsupervised approach, we identified four clusters of lung function trajectory with distinct clinical and biochemical features. Enriching or stratifying longitudinal spirometric data into clusters might optimise evaluation of intervention efficacy during clinical trials and patient management.
National Institute for Health and Care Research, Medical Research Council, and GlaxoSmithKline.
CitationLancet Digit Health 2022; 4: e862–72
Author affiliationDepartment of Health Sciences
- VoR (Version of Record)