Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies

the VA Million Veteran Program

Research output: Contribution to journalArticle

Abstract

Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.

Original languageEnglish (US)
Pages (from-to)763-772
Number of pages10
JournalAmerican Journal of Human Genetics
Volume105
Issue number4
DOIs
StatePublished - Oct 3 2019

Fingerprint

Genome-Wide Association Study
Population Genetics
Health
Population
Machine Learning

Keywords

  • biobank
  • ethnicity-specific trait loci
  • genetic ancestry
  • multi-ethnic cohort
  • self-reported race/ethnicity
  • stratified analysis
  • trans-ethnic GWAS

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies. / the VA Million Veteran Program.

In: American Journal of Human Genetics, Vol. 105, No. 4, 03.10.2019, p. 763-772.

Research output: Contribution to journalArticle

@article{8217d46e5f6346ebb033e9ad4296e80c,
title = "Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies",
abstract = "Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.",
keywords = "biobank, ethnicity-specific trait loci, genetic ancestry, multi-ethnic cohort, self-reported race/ethnicity, stratified analysis, trans-ethnic GWAS",
author = "{the VA Million Veteran Program} and Huaying Fang and Qin Hui and Julie Lynch and Jacqueline Honerlaw and Assimes, {Themistocles L.} and Jie Huang and Marijana Vujkovic and Damrauer, {Scott M.} and Saiju Pyarajan and Gaziano, {J. Michael} and DuVall, {Scott L.} and O'Donnell, {Christopher J.} and Kelly Cho and Chang, {Kyong Mi} and Wilson, {Peter W.F.} and Tsao, {Philip S.} and Rachel Ramoni and Jim Breeling and Grant Huang and Sumitra Muralidhar and Jennifer Moser and Whitbourne, {Stacey B.} and Brewer, {Jessica V.} and John Concato and Stuart Warren and Argyres, {Dean P.} and Brady Stephens and Brophy, {Mary T.} and Humphries, {Donald E.} and Nhan Do and Shahpoor Shayan and Nguyen, {Xuan Mai T.} and Elizabeth Hauser and Yan Sun and Hongyu Zhao and Peter Wilson and Rachel McArdle and Louis Dellitalia and John Harley and Jeffrey Whittle and Jean Beckham and John Wells and Salvador Gutierrez and Gretchen Gibson and Laurence Kaminsky and Gerardo Villareal and Scott Kinlay and Junzhe Xu and Mark Hamner and Sujata Bhushan",
year = "2019",
month = "10",
day = "3",
doi = "10.1016/j.ajhg.2019.08.012",
language = "English (US)",
volume = "105",
pages = "763--772",
journal = "American Journal of Human Genetics",
issn = "0002-9297",
publisher = "Cell Press",
number = "4",

}

TY - JOUR

T1 - Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies

AU - the VA Million Veteran Program

AU - Fang, Huaying

AU - Hui, Qin

AU - Lynch, Julie

AU - Honerlaw, Jacqueline

AU - Assimes, Themistocles L.

AU - Huang, Jie

AU - Vujkovic, Marijana

AU - Damrauer, Scott M.

AU - Pyarajan, Saiju

AU - Gaziano, J. Michael

AU - DuVall, Scott L.

AU - O'Donnell, Christopher J.

AU - Cho, Kelly

AU - Chang, Kyong Mi

AU - Wilson, Peter W.F.

AU - Tsao, Philip S.

AU - Ramoni, Rachel

AU - Breeling, Jim

AU - Huang, Grant

AU - Muralidhar, Sumitra

AU - Moser, Jennifer

AU - Whitbourne, Stacey B.

AU - Brewer, Jessica V.

AU - Concato, John

AU - Warren, Stuart

AU - Argyres, Dean P.

AU - Stephens, Brady

AU - Brophy, Mary T.

AU - Humphries, Donald E.

AU - Do, Nhan

AU - Shayan, Shahpoor

AU - Nguyen, Xuan Mai T.

AU - Hauser, Elizabeth

AU - Sun, Yan

AU - Zhao, Hongyu

AU - Wilson, Peter

AU - McArdle, Rachel

AU - Dellitalia, Louis

AU - Harley, John

AU - Whittle, Jeffrey

AU - Beckham, Jean

AU - Wells, John

AU - Gutierrez, Salvador

AU - Gibson, Gretchen

AU - Kaminsky, Laurence

AU - Villareal, Gerardo

AU - Kinlay, Scott

AU - Xu, Junzhe

AU - Hamner, Mark

AU - Bhushan, Sujata

PY - 2019/10/3

Y1 - 2019/10/3

N2 - Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.

AB - Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.

KW - biobank

KW - ethnicity-specific trait loci

KW - genetic ancestry

KW - multi-ethnic cohort

KW - self-reported race/ethnicity

KW - stratified analysis

KW - trans-ethnic GWAS

UR - http://www.scopus.com/inward/record.url?scp=85072756357&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072756357&partnerID=8YFLogxK

U2 - 10.1016/j.ajhg.2019.08.012

DO - 10.1016/j.ajhg.2019.08.012

M3 - Article

C2 - 31564439

AN - SCOPUS:85072756357

VL - 105

SP - 763

EP - 772

JO - American Journal of Human Genetics

JF - American Journal of Human Genetics

SN - 0002-9297

IS - 4

ER -