TY - GEN
T1 - Machine Learning for Cancer Subtype Prediction with FSA Method
AU - Liu, Yan
AU - Wang, Xu Dong
AU - Qiu, Meikang
AU - Zhao, Hui
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Recent research demonstrates that gene expression based cancer subtype classification has more advantages over the traditional classification. However, since this kind of data always has thousands of features, performing classification is impossible by human beings without efficient and accurate algorithms. This paper reports an empirical study that explores the problem of finding a highly-efficient and accurate machine learning method on human cancer subtype classification based on the gene expression data in cancer cells. Several machine learning algorithms are well developed to solve this kind of problems, including Naive Bayes Classifier, Support Vector Machine (SVM), Random Forest, Neural Networks. Here we generate two prediction models using SVM and Random Forest algorithms along with a feature selection approach (FSA) to predict the subtype of lung cell lines. The accuracy of the two prediction models is close with a rate of more than 90%. However, the running time of SVM is much shorter than that of Random Forest.
AB - Recent research demonstrates that gene expression based cancer subtype classification has more advantages over the traditional classification. However, since this kind of data always has thousands of features, performing classification is impossible by human beings without efficient and accurate algorithms. This paper reports an empirical study that explores the problem of finding a highly-efficient and accurate machine learning method on human cancer subtype classification based on the gene expression data in cancer cells. Several machine learning algorithms are well developed to solve this kind of problems, including Naive Bayes Classifier, Support Vector Machine (SVM), Random Forest, Neural Networks. Here we generate two prediction models using SVM and Random Forest algorithms along with a feature selection approach (FSA) to predict the subtype of lung cell lines. The accuracy of the two prediction models is close with a rate of more than 90%. However, the running time of SVM is much shorter than that of Random Forest.
KW - Cancer subtype
KW - Feature selection
KW - Machine learning
KW - Random Forest
KW - Support Vector Machine
UR - http://www.scopus.com/inward/record.url?scp=85076163584&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076163584&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-34139-8_39
DO - 10.1007/978-3-030-34139-8_39
M3 - Conference contribution
AN - SCOPUS:85076163584
SN - 9783030341381
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 387
EP - 397
BT - Smart Computing and Communication - 4th International Conference, SmartCom 2019, Proceedings
A2 - Qiu, Meikang
PB - Springer
T2 - 4th International Conference on Smart Computing and Communications, SmartCom 2019
Y2 - 11 October 2019 through 13 October 2019
ER -