Deep Learning for Unbiased Representation in Cellular Microscopy Imaging
Cellular microscopy image analysis has emerged as a critical tool in drug discovery, offering insights that complement traditional genetic and transcriptomic data analyses used to uncover biological activities. The advent of deep learning has significantly advanced cellular microscopy imaging analysis, making it a cornerstone in evaluating cell line viability and proliferation, with notable applications in cell line authentication and high-content screening (HCS). Nonetheless, existing methodologies often fall short in addressing the complexity of cellular images to deliver consistent performance due to several challenges: The fine-grained nature of different cells within imaging views, which exhibit subtle morphological distinctions, complicates the accurate differentiation of cell lines. This results in closely spaced inter-class distances and broadly spaced intra-class distances within the data distribution. The presence of biological batch (bio-batch) effects can alter image attributes such as contrast, brightness, and cell characteristics, potentially skewing experimental outcomes if correlated with variables of interest. In HCS, cells are typically marked with multiple fluorophores, each contributing to a distinct image channel. While these fluorophores accentuate various cellular components, they also introduce challenges in data distribution, especially in the presence of bio-batch effects. These challenges can lead models to learn biased representations that capture patterns specific to certain conditions, rather than those genuinely reflective of biological or structural features of interest. This thesis aimed to develop structured deep learning pipelines to extract unbiased representations from cellular images, ensuring that model outputs emphasized essential data characteristics relevant to specific tasks while minimizing the influence of extraneous variables, such as bio-batch effects or noise. The goal was to achieve consistent classification performance in downstream tasks, including cell line authentication and siRNA perturbation classification. To address the identified challenges, several novel approaches were developed: A multi-task framework for single-batch brightfield image analysis was proposed to address fine-grained distinctions between closely related cell lines. By simultaneously predicting incubation periods, this framework improved differentiation and resolved issues related to data distribution. For multi-batch brightfield image analysis, CLANet was introduced to mitigate bio-batch effects across experimental batches. This approach targeted three key forms of bio-batch effects and provided tailored solutions for each, ensuring consistent model performance. In the context of multi-batch and multi-channel analysis, a domain generalization method called Adversarial Batch Representation Augmentation (ABRA) was developed. This method leveraged adversarial learning and uncertainty modeling to augment the representation space, thereby enhancing model robustness and enabling unbiased representation learning in HCS. Comprehensive experiments demonstrate the efficacy of these methods in cell line authentication and siRNA perturbation classification based on cellular microscopy images.
History
Supervisor(s)
Huiyu Zhou; Yinhai Wang; Adam Corrigan; Hongji YangDate of award
2024-12-12Author affiliation
School of Computing and Mathematical SciencesAwarding institution
University of LeicesterQualification level
- Doctoral
Qualification name
- PhD