The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Leming Shi; Wendell D. Jones; Roderick V. Jensen; Stephen C. Harris; Roger G. Perkins; Federico M. Goodsaid; Lei Guo; Lisa J. Croner; Cecilie Boysen; Hong Fang; Feng Qian; Shashi Amur; Wenjun Bao; Catalin C. Barbacioru; Vincent Bertholet; Xiaoxi Megan Cao; Tzu Ming Chu; Patrick J. Collins; Xiao Hui Fan; Felix W. Frueh; James C. Fuscoe; Xu Guo; Jing Han; Damir Herman; Huixiao Hong; Ernest S. Kawasaki; Quan Zhen Li; Yuling Luo; Yunqing Ma; Nan Mei; Ron L. Peterson; Raj K. Puri; Richard Shippy; Zhenqiang Su; Yongming Andrew Sun; Hongmei Sun; Brett Thorn; Yaron Turpaz; Charles Wang; Sue Jane Wang; Janet A. Warrington; James C. Willey; Jie Wu; Qian Xie; Liang Zhang; Lu Zhang; Sheng Zhong; Russell D. Wolfinger; Weida Tong

doi:10.1186/1471-2105-9-S9-S10

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Leming Shi, Wendell D. Jones, Roderick V. Jensen, Stephen C. Harris, Roger G. Perkins, Federico M. Goodsaid, Lei Guo, Lisa J. Croner, Cecilie Boysen, Hong Fang, Feng Qian, Shashi Amur, Wenjun Bao, Catalin C. Barbacioru, Vincent Bertholet, Xiaoxi Megan Cao, Tzu Ming Chu, Patrick J. Collins, Xiao Hui Fan, Felix W. FruehJames C. Fuscoe, Xu Guo, Jing Han, Damir Herman, Huixiao Hong, Ernest S. Kawasaki, Quan Zhen Li, Yuling Luo, Yunqing Ma, Nan Mei, Ron L. Peterson, Raj K. Puri, Richard Shippy, Zhenqiang Su, Yongming Andrew Sun, Hongmei Sun, Brett Thorn, Yaron Turpaz, Charles Wang, Sue Jane Wang, Janet A. Warrington, James C. Willey, Jie Wu, Qian Xie, Liang Zhang, Lu Zhang, Sheng Zhong, Russell D. Wolfinger, Weida Tong

Research output: Contribution to journal › Article › peer-review

199 Scopus citations

Abstract

Background: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.

Original language	English (US)
Article number	S10
Journal	BMC Bioinformatics
Volume	9
Issue number	SUPPL. 9
DOIs	https://doi.org/10.1186/1471-2105-9-S9-S10
State	Published - Aug 12 2008

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-9-S9-S10

Cite this

Shi, L., Jones, W. D., Jensen, R. V., Harris, S. C., Perkins, R. G., Goodsaid, F. M., Guo, L., Croner, L. J., Boysen, C., Fang, H., Qian, F., Amur, S., Bao, W., Barbacioru, C. C., Bertholet, V., Cao, X. M., Chu, T. M., Collins, P. J., Fan, X. H., ... Tong, W. (2008). The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies. BMC Bioinformatics, 9(SUPPL. 9), Article S10. https://doi.org/10.1186/1471-2105-9-S9-S10

Shi, L, Jones, WD, Jensen, RV, Harris, SC, Perkins, RG, Goodsaid, FM, Guo, L, Croner, LJ, Boysen, C, Fang, H, Qian, F, Amur, S, Bao, W, Barbacioru, CC, Bertholet, V, Cao, XM, Chu, TM, Collins, PJ, Fan, XH, Frueh, FW, Fuscoe, JC, Guo, X, Han, J, Herman, D, Hong, H, Kawasaki, ES, Li, QZ, Luo, Y, Ma, Y, Mei, N, Peterson, RL, Puri, RK, Shippy, R, Su, Z, Sun, YA, Sun, H, Thorn, B, Turpaz, Y, Wang, C, Wang, SJ, Warrington, JA, Willey, JC, Wu, J, Xie, Q, Zhang, L, Zhang, L, Zhong, S, Wolfinger, RD & Tong, W 2008, 'The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies', BMC Bioinformatics, vol. 9, no. SUPPL. 9, S10. https://doi.org/10.1186/1471-2105-9-S9-S10

@article{1552013a607b4bdabf5dfb6120a8d681,

title = "The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies",

abstract = "Background: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded {"}standard{"} gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.",

author = "Leming Shi and Jones, {Wendell D.} and Jensen, {Roderick V.} and Harris, {Stephen C.} and Perkins, {Roger G.} and Goodsaid, {Federico M.} and Lei Guo and Croner, {Lisa J.} and Cecilie Boysen and Hong Fang and Feng Qian and Shashi Amur and Wenjun Bao and Barbacioru, {Catalin C.} and Vincent Bertholet and Cao, {Xiaoxi Megan} and Chu, {Tzu Ming} and Collins, {Patrick J.} and Fan, {Xiao Hui} and Frueh, {Felix W.} and Fuscoe, {James C.} and Xu Guo and Jing Han and Damir Herman and Huixiao Hong and Kawasaki, {Ernest S.} and Li, {Quan Zhen} and Yuling Luo and Yunqing Ma and Nan Mei and Peterson, {Ron L.} and Puri, {Raj K.} and Richard Shippy and Zhenqiang Su and Sun, {Yongming Andrew} and Hongmei Sun and Brett Thorn and Yaron Turpaz and Charles Wang and Wang, {Sue Jane} and Warrington, {Janet A.} and Willey, {James C.} and Jie Wu and Qian Xie and Liang Zhang and Lu Zhang and Sheng Zhong and Wolfinger, {Russell D.} and Weida Tong",

year = "2008",

month = aug,

day = "12",

doi = "10.1186/1471-2105-9-S9-S10",

language = "English (US)",

volume = "9",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "SUPPL. 9",

}

TY - JOUR

T1 - The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

AU - Shi, Leming

AU - Jones, Wendell D.

AU - Jensen, Roderick V.

AU - Harris, Stephen C.

AU - Perkins, Roger G.

AU - Goodsaid, Federico M.

AU - Guo, Lei

AU - Croner, Lisa J.

AU - Boysen, Cecilie

AU - Fang, Hong

AU - Qian, Feng

AU - Amur, Shashi

AU - Bao, Wenjun

AU - Barbacioru, Catalin C.

AU - Bertholet, Vincent

AU - Cao, Xiaoxi Megan

AU - Chu, Tzu Ming

AU - Collins, Patrick J.

AU - Fan, Xiao Hui

AU - Frueh, Felix W.

AU - Fuscoe, James C.

AU - Guo, Xu

AU - Han, Jing

AU - Herman, Damir

AU - Hong, Huixiao

AU - Kawasaki, Ernest S.

AU - Li, Quan Zhen

AU - Luo, Yuling

AU - Ma, Yunqing

AU - Mei, Nan

AU - Peterson, Ron L.

AU - Puri, Raj K.

AU - Shippy, Richard

AU - Su, Zhenqiang

AU - Sun, Yongming Andrew

AU - Sun, Hongmei

AU - Thorn, Brett

AU - Turpaz, Yaron

AU - Wang, Charles

AU - Wang, Sue Jane

AU - Warrington, Janet A.

AU - Willey, James C.

AU - Wu, Jie

AU - Xie, Qian

AU - Zhang, Liang

AU - Zhang, Lu

AU - Zhong, Sheng

AU - Wolfinger, Russell D.

AU - Tong, Weida

PY - 2008/8/12

Y1 - 2008/8/12

N2 - Background: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.

AB - Background: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.

UR - http://www.scopus.com/inward/record.url?scp=49649083648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49649083648&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-S9-S10

DO - 10.1186/1471-2105-9-S9-S10

M3 - Article

C2 - 18793455

AN - SCOPUS:49649083648

SN - 1471-2105

VL - 9

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - SUPPL. 9

M1 - S10

ER -

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this