Bayesian optimal discovery procedure for simultaneous significance testing

Jing Cao; Xian Jin Xie; Song Zhang; Angelique Whitehurst; Michael A. White

doi:10.1186/1471-2105-10-5

Bayesian optimal discovery procedure for simultaneous significance testing

Jing Cao, Xian Jin Xie, Song Zhang, Angelique Whitehurst, Michael A. White

Research output: Contribution to journal › Article › peer-review

20 Scopus citations

Abstract

Background: In high throughput screening, such as differential gene expression screening, drug sensitivity screening, and genome-wide RNAi screening, tens of thousands of tests need to be conducted simultaneously. However, the number of replicate measurements per test is extremely small, rarely exceeding 3. Several current approaches demonstrate that test statistics with shrinking variance estimates have more power over the traditional t statistic. Results: We propose a Bayesian hierarchical model to incorporate the shrinkage concept by introducing a mixture structure on variance components. The estimates from the Bayesian model are utilized in the optimal discovery procedure (ODP) proposed by Storey in 2007, which was shown to have optimal performance in multiple significance tests. We compared the performance of the Bayesian ODP with several competing test statistics. Conclusion: We have conducted simulation studies with 2 to 6 replicates per gene. We have also included test results from two real datasets. The Bayesian ODP outperforms the other methods in our study, including the original ODP. The advantage of the Bayesian ODP becomes more significant when there are few replicates per test. The improvement over the original ODP is based on the fact that Bayesian model borrows strength across genes in estimating unknown parameters. The proposed approach is efficient in computation due to the conjugate structure of the Bayesian model. The R code (see Additional file 1) to calculate the Bayesian ODP is provided.

Original language	English (US)
Article number	5
Journal	BMC Bioinformatics
Volume	10
DOIs	https://doi.org/10.1186/1471-2105-10-5
State	Published - Jan 6 2009

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-10-5

Cite this

@article{1e8bdb9a634840b79c22f3de143c05a5,

title = "Bayesian optimal discovery procedure for simultaneous significance testing",

abstract = "Background: In high throughput screening, such as differential gene expression screening, drug sensitivity screening, and genome-wide RNAi screening, tens of thousands of tests need to be conducted simultaneously. However, the number of replicate measurements per test is extremely small, rarely exceeding 3. Several current approaches demonstrate that test statistics with shrinking variance estimates have more power over the traditional t statistic. Results: We propose a Bayesian hierarchical model to incorporate the shrinkage concept by introducing a mixture structure on variance components. The estimates from the Bayesian model are utilized in the optimal discovery procedure (ODP) proposed by Storey in 2007, which was shown to have optimal performance in multiple significance tests. We compared the performance of the Bayesian ODP with several competing test statistics. Conclusion: We have conducted simulation studies with 2 to 6 replicates per gene. We have also included test results from two real datasets. The Bayesian ODP outperforms the other methods in our study, including the original ODP. The advantage of the Bayesian ODP becomes more significant when there are few replicates per test. The improvement over the original ODP is based on the fact that Bayesian model borrows strength across genes in estimating unknown parameters. The proposed approach is efficient in computation due to the conjugate structure of the Bayesian model. The R code (see Additional file 1) to calculate the Bayesian ODP is provided.",

author = "Jing Cao and Xie, {Xian Jin} and Song Zhang and Angelique Whitehurst and White, {Michael A.}",

note = "Funding Information: The authors thank the associate editor and the reviewers for their constructive comments and suggestions, which led to substantial improvement of the manuscript. This work was partly supported by NIH grant UL1 RR024982.",

year = "2009",

month = jan,

day = "6",

doi = "10.1186/1471-2105-10-5",

language = "English (US)",

volume = "10",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Bayesian optimal discovery procedure for simultaneous significance testing

AU - Cao, Jing

AU - Xie, Xian Jin

AU - Zhang, Song

AU - Whitehurst, Angelique

AU - White, Michael A.

N1 - Funding Information: The authors thank the associate editor and the reviewers for their constructive comments and suggestions, which led to substantial improvement of the manuscript. This work was partly supported by NIH grant UL1 RR024982.

PY - 2009/1/6

Y1 - 2009/1/6

N2 - Background: In high throughput screening, such as differential gene expression screening, drug sensitivity screening, and genome-wide RNAi screening, tens of thousands of tests need to be conducted simultaneously. However, the number of replicate measurements per test is extremely small, rarely exceeding 3. Several current approaches demonstrate that test statistics with shrinking variance estimates have more power over the traditional t statistic. Results: We propose a Bayesian hierarchical model to incorporate the shrinkage concept by introducing a mixture structure on variance components. The estimates from the Bayesian model are utilized in the optimal discovery procedure (ODP) proposed by Storey in 2007, which was shown to have optimal performance in multiple significance tests. We compared the performance of the Bayesian ODP with several competing test statistics. Conclusion: We have conducted simulation studies with 2 to 6 replicates per gene. We have also included test results from two real datasets. The Bayesian ODP outperforms the other methods in our study, including the original ODP. The advantage of the Bayesian ODP becomes more significant when there are few replicates per test. The improvement over the original ODP is based on the fact that Bayesian model borrows strength across genes in estimating unknown parameters. The proposed approach is efficient in computation due to the conjugate structure of the Bayesian model. The R code (see Additional file 1) to calculate the Bayesian ODP is provided.

AB - Background: In high throughput screening, such as differential gene expression screening, drug sensitivity screening, and genome-wide RNAi screening, tens of thousands of tests need to be conducted simultaneously. However, the number of replicate measurements per test is extremely small, rarely exceeding 3. Several current approaches demonstrate that test statistics with shrinking variance estimates have more power over the traditional t statistic. Results: We propose a Bayesian hierarchical model to incorporate the shrinkage concept by introducing a mixture structure on variance components. The estimates from the Bayesian model are utilized in the optimal discovery procedure (ODP) proposed by Storey in 2007, which was shown to have optimal performance in multiple significance tests. We compared the performance of the Bayesian ODP with several competing test statistics. Conclusion: We have conducted simulation studies with 2 to 6 replicates per gene. We have also included test results from two real datasets. The Bayesian ODP outperforms the other methods in our study, including the original ODP. The advantage of the Bayesian ODP becomes more significant when there are few replicates per test. The improvement over the original ODP is based on the fact that Bayesian model borrows strength across genes in estimating unknown parameters. The proposed approach is efficient in computation due to the conjugate structure of the Bayesian model. The R code (see Additional file 1) to calculate the Bayesian ODP is provided.

UR - http://www.scopus.com/inward/record.url?scp=60849125094&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60849125094&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-10-5

DO - 10.1186/1471-2105-10-5

M3 - Article

C2 - 19126217

AN - SCOPUS:60849125094

SN - 1471-2105

VL - 10

JO - BMC Bioinformatics

JF - BMC Bioinformatics

M1 - 5

ER -

Bayesian optimal discovery procedure for simultaneous significance testing

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this