Paper Title
An Empirical Comparison of Resampling Techniques for Software Defect Prediction
Abstract
In machine learning, class imbalance is a major problem, especially in areas like software defect prediction. In
order to address this problem, resampling techniques are frequently used to rebalance the class distribution before training
the model. In this empirical study, we compare five machine learning algorithms (random forest, k-nearest neighbors
(KNN), neural network, gradient boost, and support vector machine) with ten resampling techniques, comprising five
undersampling and five oversampling methods. Using software defect prediction datasets, we evaluate the performance
metrics precision, recall, and F1-score. Our results demonstrate how different resampling strategies can enhance the
performance of machine learning algorithms on unbalanced datasets. We discuss the implications of our results and provide
insights into selecting suitable strategies for addressing class imbalances in machine learning tasks. This research contributes
to enhancing the understanding of resampling techniques and their practical application in real-world scenarios, particularly
in software engineering domains.
Keywords - class imbalance, machine learning, resampling techniques, software defect prediction