TY - JOUR
T1 - Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks
AU - Yu, Donghyeon
AU - Son, Won
AU - Lim, Johan
AU - Xiao, Guanghua
N1 - Publisher Copyright:
© The Author 2015. Published by Oxford University Press.
PY - 2015/10
Y1 - 2015/10
N2 - We study the estimation of a Gaussian graphical model whose dependent structures are partially identified. In a Gaussian graphical model, an off-diagonal zero entry in the concentration matrix (the inverse covariance matrix) implies the conditional independence of two corresponding variables, given all other variables. A number of methods have been proposed to estimate a sparse large-scale Gaussian graphical model or, equivalently, a sparse large-scale concentration matrix. In practice, the graph structure to be estimated is often partially identified by other sources or a pre-screening. In this paper, we propose a simple modification of existing methods to take into account this information in the estimation. We show that the partially identified dependent structure reduces the error in estimating the dependent structure. We apply the proposed method to estimating the gene regulatory network from lung cancer data, where protein-protein interactions are partially identified from the human protein reference database. The application shows that proposed method identified many important cancer genes as hub genes in the constructed lung cancer network. In addition, we validated the prognostic importance of a newly identified cancer gene, PTPN13, in four independent lung cancer datasets. The results indicate that the proposed method could facilitate studying underlying lung cancer mechanisms and identifying reliable biomarkers for lung cancer prognosis.
AB - We study the estimation of a Gaussian graphical model whose dependent structures are partially identified. In a Gaussian graphical model, an off-diagonal zero entry in the concentration matrix (the inverse covariance matrix) implies the conditional independence of two corresponding variables, given all other variables. A number of methods have been proposed to estimate a sparse large-scale Gaussian graphical model or, equivalently, a sparse large-scale concentration matrix. In practice, the graph structure to be estimated is often partially identified by other sources or a pre-screening. In this paper, we propose a simple modification of existing methods to take into account this information in the estimation. We show that the partially identified dependent structure reduces the error in estimating the dependent structure. We apply the proposed method to estimating the gene regulatory network from lung cancer data, where protein-protein interactions are partially identified from the human protein reference database. The application shows that proposed method identified many important cancer genes as hub genes in the constructed lung cancer network. In addition, we validated the prognostic importance of a newly identified cancer gene, PTPN13, in four independent lung cancer datasets. The results indicate that the proposed method could facilitate studying underlying lung cancer mechanisms and identifying reliable biomarkers for lung cancer prognosis.
KW - Concentration matrix
KW - Gaussian graphical models
KW - Gene regulatory network
KW - Lung cancer
KW - Partially identified graph
KW - Protein-protein interaction
UR - http://www.scopus.com/inward/record.url?scp=84943566938&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84943566938&partnerID=8YFLogxK
U2 - 10.1093/biostatistics/kxv013
DO - 10.1093/biostatistics/kxv013
M3 - Article
C2 - 25837438
AN - SCOPUS:84943566938
SN - 1465-4644
VL - 16
SP - 670
EP - 685
JO - Biostatistics
JF - Biostatistics
IS - 4
ER -