A Classification Algorithm for Imbalanced Dataset of Sample Density
-
-
Abstract
In order to resolve the classifiers' over fitting phenomenon to enhance classification performance, a new algorithm based on sample density is proposed for imbalanced data classification. Firstly, it computes the density of samples and the density of every class. Then it works out the number of class with cluster algorithm according to the relation of sample density of every class. Then it clusters the samples of majority class using K- means algorithm with above class number. The cluster centers are treated as the new samples and then a new training dataset is constructed with the new samples and minority dataset. According to the new training dataset, we can get the decision function. The method may resolve the problem of imbalanced dataset and improve the classification performance of SVM. Results of experiments with artificial dataset and six groups of UCI dataset show that the algorithm is effective for imbalanced dataset, especially for the minority class samples.
-
-