Using input dependent weights for model combination and model selection with multiple sources of data

We Pan, Guanghua Xiao, Xiaohong Huang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

Original languageEnglish (US)
Pages (from-to)523-540
Number of pages18
JournalStatistica Sinica
Volume16
Issue number2
StatePublished - Apr 2006

Keywords

  • Classification
  • Microarray data
  • Model mixing
  • Partial least squares
  • Prediction

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Using input dependent weights for model combination and model selection with multiple sources of data'. Together they form a unique fingerprint.

Cite this