Association Analysis and Meta-Analysis of Multi-allelic Variants for Large Scale Sequence Data

Xiaowei Zhan, Sai Chen, Yu Jiang, Mengzhen Liu, William G. Iacono, John K. Hewitt, John E. Hokanson, Kenneth Krauter, Markku Laakso, Kevin W. Li, Sharon M. Lutz, Matthew McGue, Anita Pandit, Gregory J.M. Zajac, Michael Boehnke, Goncalo R. Abecasis, Bibo Jiang, Scott I. Vrieze, Dajiang J. Liu

Research output: Contribution to journalArticlepeer-review


Motivation: There is great interest to understand the impact of rare variants in human diseases using large sequence datasets. In deep sequences datasets of >10,000 samples, ∼10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. Results: We propose novel methods to encode multi-allelic sites, conduct single variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ∼18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single variant association tests, and enhanced gene-level tests over existing approaches. Availability: Software packages implementing these methods are available at ( Contact:;

Original languageEnglish (US)
JournalUnknown Journal
StatePublished - Oct 3 2017

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • Immunology and Microbiology(all)
  • Neuroscience(all)
  • Pharmacology, Toxicology and Pharmaceutics(all)

Fingerprint Dive into the research topics of 'Association Analysis and Meta-Analysis of Multi-allelic Variants for Large Scale Sequence Data'. Together they form a unique fingerprint.

Cite this