Class imbalance is a pervasive challenge in real-world machine learning (ML) applications, where the minority class, often the class of interest, is significantly underrepresented. This imbalance can negatively affect model performance, lead to misleading evaluation metrics, and introduce validation challenges. Two prominent data-augmentation techniques to address class imbalance are the Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GAN). However, both techniques have their inherent limitations, motivating the emergence of novel variants designed to overcome these challenges. While previous reviews have primarily focused on specific domains, traditional methodologies, or broad strategy overviews, this reviewpresents a unified taxonomy that captures the causes, types, and implications of class imbalance across diverse ML tasks. It further explores emerging trends in SMOTE and GAN applications, limitations, and hybrid adaptations. By categorising imbalance types and examining models, metrics, datasets, and comparative approaches, this review provides actionable insights and future research directions for practitioners and researchers addressing class imbalance in real-world ML tasks.<p></p>
Funding
Iraqi Prime Minister’s Office, the Higher Committee of Education and Development in Iraq (HCED), the Petroleum Technology Development Fund, Nigeria
10.13039/501100004227-NISCO U.K. Research Centre, School of Engineering, University of Leicester
History
Author affiliation
College of Science & Engineering
Engineering
Version
VoR (Version of Record)
Published in
IEEE Access
Volume
13
Pagination
113838
Publisher
Institute of Electrical and Electronics Engineers (IEEE)