University of Leicester
Browse

Component-based feature saliency for clustering

Download (2.59 MB)
journal contribution
posted on 2019-09-19, 10:40 authored by X Hong, H Li, P Miller, J Zhou, L Li, D Crookes, Y Lu, X Li, H Zhou
Simultaneous feature selection and clustering is a major challenge in unsupervised learning. In particular, there has been significant research into saliency measures for features that result in good clustering. However, as datasets become larger and more complex, there is a need to adopt a finer-grained approach to saliency by measuring it in relation to a part of a model. Another issue is learning the feature saliency and advanced model parameters. We address the first by presenting a novel Gaussian mixture model, which explicitly models the dependency of individual mixture components on each feature giving a new component-based feature saliency measure. For the second, we use Markov Chain Monte Carlo sampling to estimate the model and hidden variables. Using a synthetic dataset, we demonstrate the superiority of our approach, in terms of clustering accuracy and model parameter estimation, over an approach using a model-based feature saliency with expectation maximisation. We performed an evaluation of our approach with six synthetic trajectory datasets. To demonstrate the generality of our approach, we applied it to a network traffic flow dataset for intrusion detection. Finally, we performed a comparison with state-of-the-art clustering techniques using three real-world trajectory datasets of vehicle traffic.

Funding

This work has been in part supported by UK EPSRC under Grants EP/G034303/1 and EP/N508664/1.

History

Citation

IEEE Transactions on Knowledge and Data Engineering, 2019

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics

Version

  • AM (Accepted Manuscript)

Published in

IEEE Transactions on Knowledge and Data Engineering

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

issn

1041-4347

Acceptance date

2019-09-01

Copyright date

2019

Available date

2019-09-19

Publisher version

https://ieeexplore.ieee.org/abstract/document/8809812

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC