Far casting cross-validation

Patrick S. Carmack, William R. Schucany, Jeffrey S. Spence, Richard F. Gunst, Qihua Lin, Robert W. Haley

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Cross-validation has long been used for choosing tuning parameters and other model selection tasks. It generally performs well provided the data are independent, or nearly so. Improvements have been suggested which address ordinary cross-validation's (OCV) shortcomings in correlated data. Whereas these techniques have merit, they can still lead to poor model selection in correlated data or are not readily generalizable to high-dimensional data. The proposed solution, far casting cross-validation (FCCV), addresses these problems. FCCV withholds correlated neighbors in every aspect of the cross-validation procedure. The result is a technique that stresses a fittedmodel's ability to extrapolate rather than interpolate. This generally leads to better model selection in correlated datasets. Whereas FCCV is less than optimal in the independence case, our improvement of OCV applies more generally to higher dimensional error processes and to both parametric and nonparametric model selection problems. To facilitate introduction, we consider only one application, namely estimating global bandwidths for curve estimation with local linear regression. We provide theoretical motivation and report some comparative results from a simulation experiment and on a time series of annual global temperature deviations. For such data, FCCV generally has lower average squared error when disturbances are correlated. Supplementary materials are available online.

Original languageEnglish (US)
Pages (from-to)879-893
Number of pages15
JournalJournal of Computational and Graphical Statistics
Volume18
Issue number4
DOIs
StatePublished - 2009

Fingerprint

Casting
Cross-validation
Model Selection
Correlated Data
Curve Estimation
Local Linear Regression
Extrapolate
Nonparametric Model
Parameter Tuning
High-dimensional Data
Parametric Model
Simulation Experiment
Annual
High-dimensional
Deviation
Time series
Disturbance
Interpolate
Bandwidth
Model selection

Keywords

  • Dependent data
  • Optimistic error rates
  • Prediction
  • Temporal correlation
  • Tuning parameter

ASJC Scopus subject areas

  • Discrete Mathematics and Combinatorics
  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Carmack, P. S., Schucany, W. R., Spence, J. S., Gunst, R. F., Lin, Q., & Haley, R. W. (2009). Far casting cross-validation. Journal of Computational and Graphical Statistics, 18(4), 879-893. https://doi.org/10.1198/jcgs.2009.07034

Far casting cross-validation. / Carmack, Patrick S.; Schucany, William R.; Spence, Jeffrey S.; Gunst, Richard F.; Lin, Qihua; Haley, Robert W.

In: Journal of Computational and Graphical Statistics, Vol. 18, No. 4, 2009, p. 879-893.

Research output: Contribution to journalArticle

Carmack, PS, Schucany, WR, Spence, JS, Gunst, RF, Lin, Q & Haley, RW 2009, 'Far casting cross-validation', Journal of Computational and Graphical Statistics, vol. 18, no. 4, pp. 879-893. https://doi.org/10.1198/jcgs.2009.07034
Carmack, Patrick S. ; Schucany, William R. ; Spence, Jeffrey S. ; Gunst, Richard F. ; Lin, Qihua ; Haley, Robert W. / Far casting cross-validation. In: Journal of Computational and Graphical Statistics. 2009 ; Vol. 18, No. 4. pp. 879-893.
@article{e64417d14d5546dcb910efafb2859adb,
title = "Far casting cross-validation",
abstract = "Cross-validation has long been used for choosing tuning parameters and other model selection tasks. It generally performs well provided the data are independent, or nearly so. Improvements have been suggested which address ordinary cross-validation's (OCV) shortcomings in correlated data. Whereas these techniques have merit, they can still lead to poor model selection in correlated data or are not readily generalizable to high-dimensional data. The proposed solution, far casting cross-validation (FCCV), addresses these problems. FCCV withholds correlated neighbors in every aspect of the cross-validation procedure. The result is a technique that stresses a fittedmodel's ability to extrapolate rather than interpolate. This generally leads to better model selection in correlated datasets. Whereas FCCV is less than optimal in the independence case, our improvement of OCV applies more generally to higher dimensional error processes and to both parametric and nonparametric model selection problems. To facilitate introduction, we consider only one application, namely estimating global bandwidths for curve estimation with local linear regression. We provide theoretical motivation and report some comparative results from a simulation experiment and on a time series of annual global temperature deviations. For such data, FCCV generally has lower average squared error when disturbances are correlated. Supplementary materials are available online.",
keywords = "Dependent data, Optimistic error rates, Prediction, Temporal correlation, Tuning parameter",
author = "Carmack, {Patrick S.} and Schucany, {William R.} and Spence, {Jeffrey S.} and Gunst, {Richard F.} and Qihua Lin and Haley, {Robert W.}",
year = "2009",
doi = "10.1198/jcgs.2009.07034",
language = "English (US)",
volume = "18",
pages = "879--893",
journal = "Journal of Computational and Graphical Statistics",
issn = "1061-8600",
publisher = "American Statistical Association",
number = "4",

}

TY - JOUR

T1 - Far casting cross-validation

AU - Carmack, Patrick S.

AU - Schucany, William R.

AU - Spence, Jeffrey S.

AU - Gunst, Richard F.

AU - Lin, Qihua

AU - Haley, Robert W.

PY - 2009

Y1 - 2009

N2 - Cross-validation has long been used for choosing tuning parameters and other model selection tasks. It generally performs well provided the data are independent, or nearly so. Improvements have been suggested which address ordinary cross-validation's (OCV) shortcomings in correlated data. Whereas these techniques have merit, they can still lead to poor model selection in correlated data or are not readily generalizable to high-dimensional data. The proposed solution, far casting cross-validation (FCCV), addresses these problems. FCCV withholds correlated neighbors in every aspect of the cross-validation procedure. The result is a technique that stresses a fittedmodel's ability to extrapolate rather than interpolate. This generally leads to better model selection in correlated datasets. Whereas FCCV is less than optimal in the independence case, our improvement of OCV applies more generally to higher dimensional error processes and to both parametric and nonparametric model selection problems. To facilitate introduction, we consider only one application, namely estimating global bandwidths for curve estimation with local linear regression. We provide theoretical motivation and report some comparative results from a simulation experiment and on a time series of annual global temperature deviations. For such data, FCCV generally has lower average squared error when disturbances are correlated. Supplementary materials are available online.

AB - Cross-validation has long been used for choosing tuning parameters and other model selection tasks. It generally performs well provided the data are independent, or nearly so. Improvements have been suggested which address ordinary cross-validation's (OCV) shortcomings in correlated data. Whereas these techniques have merit, they can still lead to poor model selection in correlated data or are not readily generalizable to high-dimensional data. The proposed solution, far casting cross-validation (FCCV), addresses these problems. FCCV withholds correlated neighbors in every aspect of the cross-validation procedure. The result is a technique that stresses a fittedmodel's ability to extrapolate rather than interpolate. This generally leads to better model selection in correlated datasets. Whereas FCCV is less than optimal in the independence case, our improvement of OCV applies more generally to higher dimensional error processes and to both parametric and nonparametric model selection problems. To facilitate introduction, we consider only one application, namely estimating global bandwidths for curve estimation with local linear regression. We provide theoretical motivation and report some comparative results from a simulation experiment and on a time series of annual global temperature deviations. For such data, FCCV generally has lower average squared error when disturbances are correlated. Supplementary materials are available online.

KW - Dependent data

KW - Optimistic error rates

KW - Prediction

KW - Temporal correlation

KW - Tuning parameter

UR - http://www.scopus.com/inward/record.url?scp=72449182153&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72449182153&partnerID=8YFLogxK

U2 - 10.1198/jcgs.2009.07034

DO - 10.1198/jcgs.2009.07034

M3 - Article

AN - SCOPUS:72449182153

VL - 18

SP - 879

EP - 893

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

SN - 1061-8600

IS - 4

ER -