Disease detection in high-dimensional low sample size medical data
Deep learning has brought a new age of advancements for Artificial Intelligence. Data-orient modelling paired with computational power brings amazing performance to targeted tasks in computer vision. However, contrary to the hopes of practitioners, real-world data often fails to meet the size (n) or dimensionality (d) ideal for deep learning: high dimensionality is prevalent, and sample sizes are often small. This high-dimensional low-sample size scenario (HLDS) is prevalent and more extreme in medical datasets, where the dimensionality can be not only bigger d > n (in normal HDLS setting) but much bigger than the available sample size, d ≫ n. This scenario presents a major obstacle between research and application. This thesis presents our research on disease detection in HDLS medical data from the approaches of (1) combination of multiple learners, (2) the use of less data and annotations, and (3) learning with existing basis.
The first component explores combining information from a set of supervised models. In this premise, a novel committee learning method is proposed that reformulates ensemble learning as a multiple-instance learning problem, which can be solved with attention-pooling mechanisms. The method offers performance benefits to HDLS datasets and broadly applies to committee learning. The second component explores the utilisation of weakly annotated data. An empirical framework is proposed for localising disease regions and generating pseudo data for enhancement. Its performance is demonstrated for anomaly localisation and enhancement of a range of segmentation models. The third component explores a theoretical framework for understanding learning from HDLS data. Under this framework, empirical verification is provided for theoretical properties, leading to developments in simple linear and complex neuromorphic methods for semi-supervised, continual and few-shot learning. Overall, we tackled the extreme HDLS scenario of multiple medical datasets from three perspectives: committee learning, weak-supervised learning and continual/few-shot learning.
History
Supervisor(s)
Ivan TyukinDate of award
2023-11-24Author affiliation
School of Computing and Mathematical SciencesAwarding institution
University of LeicesterQualification level
- Doctoral
Qualification name
- PhD