Ranked selection of nearest discriminating features
1 School of Computer Science, Indian Institute of Information Technology and Management (IIITM) - Trivandrum, Kerala, India
2 Griffith School of Engineering, Griffith University, Brisbane, Australia
Human-centric Computing and Information Sciences 2012, 2:12 doi:10.1186/2192-1962-2-12Published: 24 June 2012
Feature selection techniques use a search-criteria driven approach for ranked feature subset selection. Often, selecting an optimal subset of ranked features using the existing methods is intractable for high dimensional gene data classification problems.
In this paper, an approach based on the individual ability of the features to discriminate between different classes is proposed. The area of overlap measure between feature to feature inter-class and intra-class distance distributions is used to measure the discriminatory ability of each feature. Features with area of overlap below a specified threshold is selected to form the subset.
The reported method achieves higher classification accuracies with fewer numbers of features for high-dimensional micro-array gene classification problems. Experiments done on CLL-SUB-111, SMK-CAN-187, GLI-85, GLA-BRA-180 and TOX-171 databases resulted in an accuracy of 74.9±2.6, 71.2±1.7, 88.3±2.9, 68.4±5.1, and 69.6±4.4, with the corresponding selected number of features being 1, 1, 3, 37, and 89 respectively.
The area of overlap between the inter-class and intra-class distances is demonstrated as a useful technique for selection of most discriminative ranked features. Improved classification accuracy is obtained by relevant selection of most discriminative features using the proposed method.