Using input dependent weights for model combination and model selection with multiple sources of data

We Pan, Guanghua Xiao, Xiaohong Huang

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

Original languageEnglish (US)
Pages (from-to)523-540
Number of pages18
JournalStatistica Sinica
Volume16
Issue number2
StatePublished - Apr 2006

Fingerprint

Model Selection
Dependent
Weighting
Prediction
Model
Heart Failure
Proteomics
Information Content
Protein-protein Interaction
Multiple Models
Gene Expression Data
Model selection
Gene Expression
Genomics
Regression
Gene
Gene expression
Protein

Keywords

  • Classification
  • Microarray data
  • Model mixing
  • Partial least squares
  • Prediction

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Using input dependent weights for model combination and model selection with multiple sources of data. / Pan, We; Xiao, Guanghua; Huang, Xiaohong.

In: Statistica Sinica, Vol. 16, No. 2, 04.2006, p. 523-540.

Research output: Contribution to journalArticle

@article{fb1ea579a955480f821c76df59912cf9,
title = "Using input dependent weights for model combination and model selection with multiple sources of data",
abstract = "With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.",
keywords = "Classification, Microarray data, Model mixing, Partial least squares, Prediction",
author = "We Pan and Guanghua Xiao and Xiaohong Huang",
year = "2006",
month = "4",
language = "English (US)",
volume = "16",
pages = "523--540",
journal = "Statistica Sinica",
issn = "1017-0405",
publisher = "Institute of Statistical Science",
number = "2",

}

TY - JOUR

T1 - Using input dependent weights for model combination and model selection with multiple sources of data

AU - Pan, We

AU - Xiao, Guanghua

AU - Huang, Xiaohong

PY - 2006/4

Y1 - 2006/4

N2 - With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

AB - With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

KW - Classification

KW - Microarray data

KW - Model mixing

KW - Partial least squares

KW - Prediction

UR - http://www.scopus.com/inward/record.url?scp=33746146485&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746146485&partnerID=8YFLogxK

M3 - Article

VL - 16

SP - 523

EP - 540

JO - Statistica Sinica

JF - Statistica Sinica

SN - 1017-0405

IS - 2

ER -