TY - JOUR
T1 - Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation
AU - Wang, Chaolong
AU - Zhan, Xiaowei
AU - Liang, Liming
AU - Abecasis, Gonçalo R.
AU - Lin, Xihong
N1 - Funding Information:
We would like to thank Dr. Noah Rosenberg and two anonymous reviewers for their valuable comments, which substantially improved this manuscript, and Dr. Jun Li for using the unpublished HGDP exome chip data. This work is supported by grants from the NIH (P01 CA134294, R01 CA092824, and P42 ES016454).
Publisher Copyright:
© 2015 The American Society of Human Genetics
PY - 2015/5/1
Y1 - 2015/5/1
N2 - Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
AB - Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
UR - http://www.scopus.com/inward/record.url?scp=84930015657&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930015657&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2015.04.018
DO - 10.1016/j.ajhg.2015.04.018
M3 - Article
C2 - 26027497
AN - SCOPUS:84930015657
SN - 0002-9297
VL - 96
SP - 926
EP - 937
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 6
ER -