A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

Yang Xie; Wei Pan; Arkady B. Khodursky

doi:10.1093/bioinformatics/bti685

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

Yang Xie, Wei Pan, Arkady B. Khodursky

Research output: Contribution to journal › Article › peer-review

88 Scopus citations

Abstract

Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. Results: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.

Original language	English (US)
Pages (from-to)	4280-4288
Number of pages	9
Journal	Bioinformatics
Volume	21
Issue number	23
DOIs	https://doi.org/10.1093/bioinformatics/bti685
State	Published - Dec 2005

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/bti685

Cite this

@article{f797b68f0d12402180db54debb617708,

title = "A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data",

abstract = "Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. Results: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.",

author = "Yang Xie and Wei Pan and Khodursky, {Arkady B.}",

note = "Funding Information: This work was supported by NIH grants HL65462 and GM066098 and a UM AHC Development grant.",

year = "2005",

month = dec,

doi = "10.1093/bioinformatics/bti685",

language = "English (US)",

volume = "21",

pages = "4280--4288",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "23",

}

TY - JOUR

T1 - A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

AU - Xie, Yang

AU - Pan, Wei

AU - Khodursky, Arkady B.

N1 - Funding Information: This work was supported by NIH grants HL65462 and GM066098 and a UM AHC Development grant.

PY - 2005/12

Y1 - 2005/12

N2 - Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. Results: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.

AB - Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. Results: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.

UR - http://www.scopus.com/inward/record.url?scp=28444438861&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28444438861&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bti685

DO - 10.1093/bioinformatics/bti685

M3 - Article

C2 - 16188930

AN - SCOPUS:28444438861

SN - 1367-4803

VL - 21

SP - 4280

EP - 4288

JO - Bioinformatics

JF - Bioinformatics

IS - 23

ER -

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this