Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes

Yu Jiang, Sai Chen, Daniel McGuire, Fang Chen, Mengzhen Liu, William G. Iacono, John K. Hewitt, John E. Hokanson, Kenneth Krauter, Markku Laakso, Kevin W. Li, Sharon M. Lutz, Matthew McGue, Anita Pandit, Gregory J.M. Zajac, Michael Boehnke, Goncalo R. Abecasis, Scott I. Vrieze, Xiaowei Zhan, Bibo JiangDajiang J. Liu

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.

Original languageEnglish (US)
Article numbere1007452
JournalPLoS Genetics
Volume14
Issue number7
DOIs
StatePublished - Jul 1 2018

Fingerprint

tobacco use
meta-analysis
Tobacco Use
tobacco
Meta-Analysis
phenotype
statistics
Phenotype
methodology
loci
cigarettes
Genetic Association Studies
method
analysis
phenotypic variation
Tobacco Products
Sample Size
genotyping
complement
genotype

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics
  • Genetics(clinical)
  • Cancer Research

Cite this

Proper conditional analysis in the presence of missing data : Application to large scale meta-analysis of tobacco use phenotypes. / Jiang, Yu; Chen, Sai; McGuire, Daniel; Chen, Fang; Liu, Mengzhen; Iacono, William G.; Hewitt, John K.; Hokanson, John E.; Krauter, Kenneth; Laakso, Markku; Li, Kevin W.; Lutz, Sharon M.; McGue, Matthew; Pandit, Anita; Zajac, Gregory J.M.; Boehnke, Michael; Abecasis, Goncalo R.; Vrieze, Scott I.; Zhan, Xiaowei; Jiang, Bibo; Liu, Dajiang J.

In: PLoS Genetics, Vol. 14, No. 7, e1007452, 01.07.2018.

Research output: Contribution to journalArticle

Jiang, Y, Chen, S, McGuire, D, Chen, F, Liu, M, Iacono, WG, Hewitt, JK, Hokanson, JE, Krauter, K, Laakso, M, Li, KW, Lutz, SM, McGue, M, Pandit, A, Zajac, GJM, Boehnke, M, Abecasis, GR, Vrieze, SI, Zhan, X, Jiang, B & Liu, DJ 2018, 'Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes', PLoS Genetics, vol. 14, no. 7, e1007452. https://doi.org/10.1371/journal.pgen.1007452
Jiang, Yu ; Chen, Sai ; McGuire, Daniel ; Chen, Fang ; Liu, Mengzhen ; Iacono, William G. ; Hewitt, John K. ; Hokanson, John E. ; Krauter, Kenneth ; Laakso, Markku ; Li, Kevin W. ; Lutz, Sharon M. ; McGue, Matthew ; Pandit, Anita ; Zajac, Gregory J.M. ; Boehnke, Michael ; Abecasis, Goncalo R. ; Vrieze, Scott I. ; Zhan, Xiaowei ; Jiang, Bibo ; Liu, Dajiang J. / Proper conditional analysis in the presence of missing data : Application to large scale meta-analysis of tobacco use phenotypes. In: PLoS Genetics. 2018 ; Vol. 14, No. 7.
@article{03e4d2ccc7cc4b218a95373b4713195b,
title = "Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes",
abstract = "Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1{\%}, improving that of previously reported associations by 71{\%}. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.",
author = "Yu Jiang and Sai Chen and Daniel McGuire and Fang Chen and Mengzhen Liu and Iacono, {William G.} and Hewitt, {John K.} and Hokanson, {John E.} and Kenneth Krauter and Markku Laakso and Li, {Kevin W.} and Lutz, {Sharon M.} and Matthew McGue and Anita Pandit and Zajac, {Gregory J.M.} and Michael Boehnke and Abecasis, {Goncalo R.} and Vrieze, {Scott I.} and Xiaowei Zhan and Bibo Jiang and Liu, {Dajiang J.}",
year = "2018",
month = "7",
day = "1",
doi = "10.1371/journal.pgen.1007452",
language = "English (US)",
volume = "14",
journal = "PLoS Genetics",
issn = "1553-7390",
publisher = "Public Library of Science",
number = "7",

}

TY - JOUR

T1 - Proper conditional analysis in the presence of missing data

T2 - Application to large scale meta-analysis of tobacco use phenotypes

AU - Jiang, Yu

AU - Chen, Sai

AU - McGuire, Daniel

AU - Chen, Fang

AU - Liu, Mengzhen

AU - Iacono, William G.

AU - Hewitt, John K.

AU - Hokanson, John E.

AU - Krauter, Kenneth

AU - Laakso, Markku

AU - Li, Kevin W.

AU - Lutz, Sharon M.

AU - McGue, Matthew

AU - Pandit, Anita

AU - Zajac, Gregory J.M.

AU - Boehnke, Michael

AU - Abecasis, Goncalo R.

AU - Vrieze, Scott I.

AU - Zhan, Xiaowei

AU - Jiang, Bibo

AU - Liu, Dajiang J.

PY - 2018/7/1

Y1 - 2018/7/1

N2 - Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.

AB - Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.

UR - http://www.scopus.com/inward/record.url?scp=85050978238&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050978238&partnerID=8YFLogxK

U2 - 10.1371/journal.pgen.1007452

DO - 10.1371/journal.pgen.1007452

M3 - Article

C2 - 30016313

AN - SCOPUS:85050978238

VL - 14

JO - PLoS Genetics

JF - PLoS Genetics

SN - 1553-7390

IS - 7

M1 - e1007452

ER -