TY - JOUR
T1 - A penalized robust method for identifying gene-environment interactions
AU - Shi, Xingjie
AU - Liu, Jin
AU - Huang, Jian
AU - Zhou, Yong
AU - Xie, Yang
AU - Ma, Shuangge
PY - 2014/4
Y1 - 2014/4
N2 - In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model misspecification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example, with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.
AB - In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model misspecification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example, with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.
KW - Gene-environment interaction
KW - Marker identification
KW - Penalization
KW - Robust rank estimation
UR - http://www.scopus.com/inward/record.url?scp=84895920054&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84895920054&partnerID=8YFLogxK
U2 - 10.1002/gepi.21795
DO - 10.1002/gepi.21795
M3 - Article
C2 - 24616063
AN - SCOPUS:84895920054
SN - 0741-0395
VL - 38
SP - 220
EP - 230
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 3
ER -