Proper conditional analysis in the presence of missing data identified novel independently associated low frequency variants in nicotine dependence genes

Bibo Jiang, Sai Chen, Yu Jiang, Mengzhen Liu, William G. Iacono, John K. Hewitt, John E. Hokanson, Kenneth Krauter, Markku Laakso, Kevin W. Li, Sharon M. Lutz, Matthew McGue, Daniel McGuire, Anita Pandit, Gregory Zajac, Michael Boehnke, Goncalo R. Abecasis, Scott I. Vrieze, Xiaowei Zhan, Dajiang J. Liu

Research output: Contribution to journalArticlepeer-review


Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values. In practice, genotype imputation is not always effective, e.g. when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Naïve extensions of existing methods either replace missing summary statistics with 0 or discard studies with missing data. These approaches can bias genetic effect estimates and lead to seriously inflated type-I or II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amount of missing values. Based on this estimator, we propose a score statistic we call PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to analyze the CHRNA5-CHRNB4-CHRNA3 locus in a large-scale meta-analysis for cigarettes-per-day. Using the new method, we identified three novel variants, independent of known association signals, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants is .46%, improving that of previously reported associations by 17%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.

Original languageEnglish (US)
JournalUnknown Journal
StatePublished - Nov 21 2017

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • Immunology and Microbiology(all)
  • Neuroscience(all)
  • Pharmacology, Toxicology and Pharmaceutics(all)

Fingerprint Dive into the research topics of 'Proper conditional analysis in the presence of missing data identified novel independently associated low frequency variants in nicotine dependence genes'. Together they form a unique fingerprint.

Cite this