A new feature selection algorithm for two-class classification problems and application to endometrial cancer

M. Eren Ahsen, Nitin K. Singh, Todd Boren, M. Vidyasagar, Michael A. White

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

In this paper, we introduce a new algorithm for feature selection for two-class classification problems, called ℓ 1-StaR. The algorithm consists of first extracting the statistically relevant features using the Student t-test, and then passing the reduced feature set to an ℓ 1-norm support vector machine (SVM) with recursive feature elimination (RFE). The final number of features chosen by the ℓ 1-StaR algorithm can be smaller than the number of samples, unlike with ℓ 1-norm regression where the final number of features is bounded below by the number of samples. The algorithm is illustrated by applying it to the problem of determining which endometrial cancer patients are at risk of having the cancer spreading to their lymph nodes. The data consisted of 1,428 micro-RNAs measured on a data set of 94 patient samples (divided evenly between those with lymph node metastasis and those without). Using the algorithm, we identified a subset of just 15 micro-RNAs and a linear classifier based on these, that achieved two-fold cross validation accuracies in excess of 80%, and combined accuracy, sensitivity and specificity in excess of 93%.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE Conference on Decision and Control
Pages2976-2982
Number of pages7
DOIs
StatePublished - 2012
Event51st IEEE Conference on Decision and Control, CDC 2012 - Maui, HI, United States
Duration: Dec 10 2012Dec 13 2012

Other

Other51st IEEE Conference on Decision and Control, CDC 2012
Country/TerritoryUnited States
CityMaui, HI
Period12/10/1212/13/12

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'A new feature selection algorithm for two-class classification problems and application to endometrial cancer'. Together they form a unique fingerprint.

Cite this