University of Leicester
Browse
- No file added yet -

FilterK: a new outlier detection method for k-means clustering of physical activity

Download (553.09 kB)
journal contribution
posted on 2020-03-24, 14:39 authored by Petra J. Jones, Matthew K. James, Melanie J. Davies, Kamlesh Khunti, Mike Catt, Tom Yates, Alex V. Rowlands, Evgeny M. Mirkes
In this paper, a new algorithm denoted as FilterK is proposed for improving the purity of k-means derived physical activity clusters by reducing outlier influence. We applied it to physical activity data obtained with body-worn accelerometers and clustered using k-means. We compared its performance with three existing outlier detection methods: Local Outlier Factor, Isolation Forests and KNN using the ground truth (class labels), average cluster and event purity (ACEP). FilterK provided comparable gains in ACEP (0.581 → 0.596 compared to 0.580–0.617) whilst removing a lower number of outliers than the other methods (4% total dataset size vs 10% to achieve this ACEP). The main focus of our new outlier detection method is to improve the cluster purities of physical activity accelerometer data, but we also suggest it may be potentially applied to other types of dataset captured by k-means clustering. We demonstrate our method using a k-means model trained on two independent accelerometer datasets (training n = 90) and re-applied to an independent dataset (test n = 41). Labelled physical activities include lying down, sitting, standing, household chores, walking (laboratory and non-laboratory based), stairs and running. This type of clustering algorithm could be used to assist with identifying optimal physical activity patterns for health.

History

Citation

Journal of Biomedical Informatics, 104 (2020), 103397

Author affiliation

Leicester Diabetes Centre; Diabetes Research Centre; School of Mathematics and Actuarial Science

Version

  • AM (Accepted Manuscript)

Published in

Journal of Biomedical Informatics

Volume

104

Publisher

Elsevier BV

issn

1532-0464

Acceptance date

2020-02-24

Copyright date

2020

Available date

2020-02-26

Publisher version

https://www.sciencedirect.com/science/article/pii/S1532046420300241

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC