TY - JOUR
T1 - Challenges of large-class-number classification (LCNC)
T2 - A novel ensemble strategy (ES) and its application to discriminating the geographical origins of 25 green teas
AU - Fu, Hai Yan
AU - Yin, Qiao Bo
AU - Xu, Lu
AU - Goodarzi, Mohammad
AU - Yang, Tian Ming
AU - Li, Gang Feng
AU - FengQiao,
AU - She, Yuan Bin
N1 - Funding Information:
Hai-Yan Fu and Yuan-Bin She are grateful to the financial support from the General and Youth Projects of National Natural Science Foundation of China (Grants nos. 21576297 , 21205145 , 21476270 and 21276006 ). Lu Xu is financially supported by the Open Research Program (no. GCTKF2014007 ) of State Key Laboratory Breeding Base of Green Chemistry Synthesis Technology (Zhejiang University of Technology), the Research Fund for the Doctoral Program of Tongren University (no. trxyDH1501 ), the Open Research Program (no. 2015ZY006 , 2015ZD001 , 2015ZD002 ) from the Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei province (South-Central University for Nationalities) and the research funds from the Education Department of Guizhou Province (no. GZKY[2015]498 ).
Publisher Copyright:
© 2016 Elsevier B.V.
PY - 2016/10/15
Y1 - 2016/10/15
N2 - Large-class-number classification (LCNC) would bring new challenges to pattern recognition due to increased data complexity and class overlapping. In this study, a novel ensemble strategy (ES) was proposed to tackle LCNC problems. By combining the One-Versus-Rest (OVR) and One-Versus-One (OVO) strategies to design a set of classifiers with reduced class numbers, ES assigns a new object to the class receiving the most votes. When two or more classes obtain the most votes, an additional OVR model is developed to discriminate them. ES, OVR, OVO and the softmax function were investigated to discriminate the geographical origins of 25 green tea samples using near-infrared (NIR) spectroscopy and Partial Least Squares Discriminant Analysis (PLSDA). Using the Standard Normal Variate (SNV) as a spectral scatter correction technique, the total accuracy was 0.6468 for OVR-PLSDA, 0.8494 for OVO-PLSDA, 0.9299 for PLSDA-softmax, and 0.9377 for ES-PLSDA, respectively. Using other preprocessing methods and multiple random splitting of the data sets obtained the similar results. The poor performance of OVR can be attributed to the increased possibility of class overlapping and high sub-model complexity. OVO was less influenced by LCNC because it is based on a set of relatively simpler two-class classifiers. PLSDA-softmax could overcome the class overlapping by nonlinear transformations. ES was demonstrated to be capable of extracting more useful information from sub-models and achieved improved performance in LCNC.
AB - Large-class-number classification (LCNC) would bring new challenges to pattern recognition due to increased data complexity and class overlapping. In this study, a novel ensemble strategy (ES) was proposed to tackle LCNC problems. By combining the One-Versus-Rest (OVR) and One-Versus-One (OVO) strategies to design a set of classifiers with reduced class numbers, ES assigns a new object to the class receiving the most votes. When two or more classes obtain the most votes, an additional OVR model is developed to discriminate them. ES, OVR, OVO and the softmax function were investigated to discriminate the geographical origins of 25 green tea samples using near-infrared (NIR) spectroscopy and Partial Least Squares Discriminant Analysis (PLSDA). Using the Standard Normal Variate (SNV) as a spectral scatter correction technique, the total accuracy was 0.6468 for OVR-PLSDA, 0.8494 for OVO-PLSDA, 0.9299 for PLSDA-softmax, and 0.9377 for ES-PLSDA, respectively. Using other preprocessing methods and multiple random splitting of the data sets obtained the similar results. The poor performance of OVR can be attributed to the increased possibility of class overlapping and high sub-model complexity. OVO was less influenced by LCNC because it is based on a set of relatively simpler two-class classifiers. PLSDA-softmax could overcome the class overlapping by nonlinear transformations. ES was demonstrated to be capable of extracting more useful information from sub-models and achieved improved performance in LCNC.
KW - Ensemble strategy (ES)
KW - Geographical origins of green teas
KW - Large-class-number classification (LCNC)
KW - Near-infrared (NIR) spectroscopy
KW - One-Versus-One (OVO)
KW - One-Versus-Rest (OVR)
UR - http://www.scopus.com/inward/record.url?scp=84976639670&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84976639670&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2016.06.018
DO - 10.1016/j.chemolab.2016.06.018
M3 - Article
AN - SCOPUS:84976639670
SN - 0169-7439
VL - 157
SP - 43
EP - 49
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
ER -