Effect of Recursive Cluster Elimination with Different Clustering Algorithms Applied to Gene Expression Data
Abstract
Feature selection (FS) is an effective tool in
dealing with high dimensionality and reducing computational
cost. Support Vector Machines – Recursive Cluster Elimination
(SVM-RCE) is one of several algorithms that have been
developed for FS in high dimensional data. SVM-RCE involves
a clustering step which originally is k-means. Using various
performance metrics, three alternative algorithms are evaluated
in this context; k-medoids, Hierarchical Clustering (HC), and
Gaussian Mixture Model (GMM). Comparisons will be carried
out on five publicly available gene expression datasets. The
results show that k-means in SVM-RCE obtains higher
performance than other tested algorithms in terms of
classification performance. Additionally, HC shows a similar
performance to k-means. Our findings show superiority of using
k-means. This study can contribute to the development of SVMRCE with different variations, leading to decrease in the
number of selected genes, and an increase in prediction
performance.