Stochastic cross validation

Lu Xu, Hai Yan Fu, Mohammad Goodarzi, Chen Bo Cai, Qiao Bo Yin, Ya Wu, Bang Cheng Tang, Yuan Bin She

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Cross validation (CV) is by far one of the most commonly used methods to estimate model complexity for partial least squares (PLS). In this study, stochastic cross validation (SCV) was proposed as a novel CV strategy, where the percent of left-out objects (PLOO) was defined as a changeable random number. We proposed two SCV strategies, namely, SCV with uniformly distributed PLOO (SCV-U) and SCV with normally distributed PLOO (SCV-N). SCV-U is actually a hybrid of leave-one-out CV (LOOCV), k-fold CV and Monte Carlo CV (MCCV). The rationale behind SCV-N is that the probability of large perturbations of the original training set will be small. SCV is expected to provide more flexibility for data splitting to explore and learn from the data set and evaluate internally a built model. SCV-U and SCV-N were used for PLS calibrations of three real data sets as well as a simulated data set and they were compared with LOOCV, k-fold CV and MCCV. Given a training and external validation set, different CV techniques were repeatedly used to evaluate the optimal model complexity and the prediction results were compared. The results indicate that SCV-U and SCV-N could provide useful alternatives to the traditional CV methods and SCV is less sensitive to the values of PLOO.

Original languageEnglish (US)
Pages (from-to)74-81
Number of pages8
JournalChemometrics and Intelligent Laboratory Systems
Volume175
DOIs
StatePublished - Apr 15 2018

Fingerprint

Calibration

Keywords

  • Cross validation (CV)
  • Model complexity
  • Multivariate calibration
  • Partial least squares (PLS)
  • Stochastic cross validation (SCV)

ASJC Scopus subject areas

  • Analytical Chemistry
  • Software
  • Process Chemistry and Technology
  • Spectroscopy
  • Computer Science Applications

Cite this

Xu, L., Fu, H. Y., Goodarzi, M., Cai, C. B., Yin, Q. B., Wu, Y., ... She, Y. B. (2018). Stochastic cross validation. Chemometrics and Intelligent Laboratory Systems, 175, 74-81. https://doi.org/10.1016/j.chemolab.2018.02.008

Stochastic cross validation. / Xu, Lu; Fu, Hai Yan; Goodarzi, Mohammad; Cai, Chen Bo; Yin, Qiao Bo; Wu, Ya; Tang, Bang Cheng; She, Yuan Bin.

In: Chemometrics and Intelligent Laboratory Systems, Vol. 175, 15.04.2018, p. 74-81.

Research output: Contribution to journalArticle

Xu, L, Fu, HY, Goodarzi, M, Cai, CB, Yin, QB, Wu, Y, Tang, BC & She, YB 2018, 'Stochastic cross validation', Chemometrics and Intelligent Laboratory Systems, vol. 175, pp. 74-81. https://doi.org/10.1016/j.chemolab.2018.02.008
Xu, Lu ; Fu, Hai Yan ; Goodarzi, Mohammad ; Cai, Chen Bo ; Yin, Qiao Bo ; Wu, Ya ; Tang, Bang Cheng ; She, Yuan Bin. / Stochastic cross validation. In: Chemometrics and Intelligent Laboratory Systems. 2018 ; Vol. 175. pp. 74-81.
@article{ffeef6800b864f2ca18bb3edee5bc43a,
title = "Stochastic cross validation",
abstract = "Cross validation (CV) is by far one of the most commonly used methods to estimate model complexity for partial least squares (PLS). In this study, stochastic cross validation (SCV) was proposed as a novel CV strategy, where the percent of left-out objects (PLOO) was defined as a changeable random number. We proposed two SCV strategies, namely, SCV with uniformly distributed PLOO (SCV-U) and SCV with normally distributed PLOO (SCV-N). SCV-U is actually a hybrid of leave-one-out CV (LOOCV), k-fold CV and Monte Carlo CV (MCCV). The rationale behind SCV-N is that the probability of large perturbations of the original training set will be small. SCV is expected to provide more flexibility for data splitting to explore and learn from the data set and evaluate internally a built model. SCV-U and SCV-N were used for PLS calibrations of three real data sets as well as a simulated data set and they were compared with LOOCV, k-fold CV and MCCV. Given a training and external validation set, different CV techniques were repeatedly used to evaluate the optimal model complexity and the prediction results were compared. The results indicate that SCV-U and SCV-N could provide useful alternatives to the traditional CV methods and SCV is less sensitive to the values of PLOO.",
keywords = "Cross validation (CV), Model complexity, Multivariate calibration, Partial least squares (PLS), Stochastic cross validation (SCV)",
author = "Lu Xu and Fu, {Hai Yan} and Mohammad Goodarzi and Cai, {Chen Bo} and Yin, {Qiao Bo} and Ya Wu and Tang, {Bang Cheng} and She, {Yuan Bin}",
year = "2018",
month = "4",
day = "15",
doi = "10.1016/j.chemolab.2018.02.008",
language = "English (US)",
volume = "175",
pages = "74--81",
journal = "Chemometrics and Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier",

}

TY - JOUR

T1 - Stochastic cross validation

AU - Xu, Lu

AU - Fu, Hai Yan

AU - Goodarzi, Mohammad

AU - Cai, Chen Bo

AU - Yin, Qiao Bo

AU - Wu, Ya

AU - Tang, Bang Cheng

AU - She, Yuan Bin

PY - 2018/4/15

Y1 - 2018/4/15

N2 - Cross validation (CV) is by far one of the most commonly used methods to estimate model complexity for partial least squares (PLS). In this study, stochastic cross validation (SCV) was proposed as a novel CV strategy, where the percent of left-out objects (PLOO) was defined as a changeable random number. We proposed two SCV strategies, namely, SCV with uniformly distributed PLOO (SCV-U) and SCV with normally distributed PLOO (SCV-N). SCV-U is actually a hybrid of leave-one-out CV (LOOCV), k-fold CV and Monte Carlo CV (MCCV). The rationale behind SCV-N is that the probability of large perturbations of the original training set will be small. SCV is expected to provide more flexibility for data splitting to explore and learn from the data set and evaluate internally a built model. SCV-U and SCV-N were used for PLS calibrations of three real data sets as well as a simulated data set and they were compared with LOOCV, k-fold CV and MCCV. Given a training and external validation set, different CV techniques were repeatedly used to evaluate the optimal model complexity and the prediction results were compared. The results indicate that SCV-U and SCV-N could provide useful alternatives to the traditional CV methods and SCV is less sensitive to the values of PLOO.

AB - Cross validation (CV) is by far one of the most commonly used methods to estimate model complexity for partial least squares (PLS). In this study, stochastic cross validation (SCV) was proposed as a novel CV strategy, where the percent of left-out objects (PLOO) was defined as a changeable random number. We proposed two SCV strategies, namely, SCV with uniformly distributed PLOO (SCV-U) and SCV with normally distributed PLOO (SCV-N). SCV-U is actually a hybrid of leave-one-out CV (LOOCV), k-fold CV and Monte Carlo CV (MCCV). The rationale behind SCV-N is that the probability of large perturbations of the original training set will be small. SCV is expected to provide more flexibility for data splitting to explore and learn from the data set and evaluate internally a built model. SCV-U and SCV-N were used for PLS calibrations of three real data sets as well as a simulated data set and they were compared with LOOCV, k-fold CV and MCCV. Given a training and external validation set, different CV techniques were repeatedly used to evaluate the optimal model complexity and the prediction results were compared. The results indicate that SCV-U and SCV-N could provide useful alternatives to the traditional CV methods and SCV is less sensitive to the values of PLOO.

KW - Cross validation (CV)

KW - Model complexity

KW - Multivariate calibration

KW - Partial least squares (PLS)

KW - Stochastic cross validation (SCV)

UR - http://www.scopus.com/inward/record.url?scp=85042380942&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042380942&partnerID=8YFLogxK

U2 - 10.1016/j.chemolab.2018.02.008

DO - 10.1016/j.chemolab.2018.02.008

M3 - Article

AN - SCOPUS:85042380942

VL - 175

SP - 74

EP - 81

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

ER -