TY - JOUR
T1 - Extensive up-regulation of gene expression in cancer
T2 - The normalised use of microarray data
AU - Wang, Dong
AU - Cheng, Lixin
AU - Zhang, Yuannv
AU - Wu, Ruihong
AU - Wang, Mingyue
AU - Gu, Yunyan
AU - Zhao, Wenyuan
AU - Li, Pengfei
AU - Li, Bin
AU - Zhang, Yujing
AU - Wang, Hongwei
AU - Huang, Yan
AU - Wang, Chenguang
AU - Guo, Zheng
PY - 2012/3
Y1 - 2012/3
N2 - Based on the assumption that only a few genes are differentially expressed in a disease and have balanced upward and downward expression level changes, researchers usually normalise microarray data by forcing all of the arrays to have the same probe intensity distributions to remove technical variations in the data. However, accumulated evidence suggests that gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalisation assumption. Here, we show that the medians of the original probe intensities increase in most of the ten cancer types analyzed in this paper, indicating that genes may be widely up-regulated in many cancer types. Thus, at least for cancer study, normalising all arrays to have the same distribution of probe intensities regardless of the state (diseased vs. normal) tends to falsely produce many down-regulated differentially expressed (DE) genes while missing many truly up-regulated DE genes. We also show that the DE genes solely detected in the non-normalised data for cancers are highly reproducible across different datasets for the same cancers, indicating that effective biological signals naturally exist in the non-normalised data. Because the powers of current statistical analyses using the non-normalised data tend to be low, we suggest selecting DE genes in both normalised and non-normalised data and then filter out the false DE genes extracted from the normalised data that show opposite deregulation directions in the non-normalised data.
AB - Based on the assumption that only a few genes are differentially expressed in a disease and have balanced upward and downward expression level changes, researchers usually normalise microarray data by forcing all of the arrays to have the same probe intensity distributions to remove technical variations in the data. However, accumulated evidence suggests that gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalisation assumption. Here, we show that the medians of the original probe intensities increase in most of the ten cancer types analyzed in this paper, indicating that genes may be widely up-regulated in many cancer types. Thus, at least for cancer study, normalising all arrays to have the same distribution of probe intensities regardless of the state (diseased vs. normal) tends to falsely produce many down-regulated differentially expressed (DE) genes while missing many truly up-regulated DE genes. We also show that the DE genes solely detected in the non-normalised data for cancers are highly reproducible across different datasets for the same cancers, indicating that effective biological signals naturally exist in the non-normalised data. Because the powers of current statistical analyses using the non-normalised data tend to be low, we suggest selecting DE genes in both normalised and non-normalised data and then filter out the false DE genes extracted from the normalised data that show opposite deregulation directions in the non-normalised data.
UR - http://www.scopus.com/inward/record.url?scp=84863167529&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863167529&partnerID=8YFLogxK
U2 - 10.1039/c2mb05466c
DO - 10.1039/c2mb05466c
M3 - Article
C2 - 22234555
AN - SCOPUS:84863167529
SN - 1742-206X
VL - 8
SP - 818
EP - 827
JO - Molecular BioSystems
JF - Molecular BioSystems
IS - 3
ER -