Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data

Bingshan Li, Qiang Wei, Xiaowei Zhan, Xue Zhong, Wei Chen, Chun Li, Jonathan Haines

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing.

Original languageEnglish (US)
Article numbere1005271
JournalPLoS Genetics
Volume11
Issue number6
DOIs
StatePublished - Jul 1 2015

Fingerprint

pedigree
genotype
Genotype
human diseases
Pedigree
genome
Genome
Information Dissemination
DNA Sequence Analysis
relatedness
DNA
family
Costs and Cost Analysis
sampling
Research
cost
Population
modeling
simulation
methodology

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Ecology, Evolution, Behavior and Systematics
  • Cancer Research
  • Genetics(clinical)

Cite this

Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. / Li, Bingshan; Wei, Qiang; Zhan, Xiaowei; Zhong, Xue; Chen, Wei; Li, Chun; Haines, Jonathan.

In: PLoS Genetics, Vol. 11, No. 6, e1005271, 01.07.2015.

Research output: Contribution to journalArticle

Li, Bingshan ; Wei, Qiang ; Zhan, Xiaowei ; Zhong, Xue ; Chen, Wei ; Li, Chun ; Haines, Jonathan. / Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. In: PLoS Genetics. 2015 ; Vol. 11, No. 6.
@article{ca858755bf3b4876bf665592d3d2ec63,
title = "Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data",
abstract = "Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing.",
author = "Bingshan Li and Qiang Wei and Xiaowei Zhan and Xue Zhong and Wei Chen and Chun Li and Jonathan Haines",
year = "2015",
month = "7",
day = "1",
doi = "10.1371/journal.pgen.1005271",
language = "English (US)",
volume = "11",
journal = "PLoS Genetics",
issn = "1553-7390",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data

AU - Li, Bingshan

AU - Wei, Qiang

AU - Zhan, Xiaowei

AU - Zhong, Xue

AU - Chen, Wei

AU - Li, Chun

AU - Haines, Jonathan

PY - 2015/7/1

Y1 - 2015/7/1

N2 - Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing.

AB - Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing.

UR - http://www.scopus.com/inward/record.url?scp=84937780464&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937780464&partnerID=8YFLogxK

U2 - 10.1371/journal.pgen.1005271

DO - 10.1371/journal.pgen.1005271

M3 - Article

C2 - 26043085

AN - SCOPUS:84937780464

VL - 11

JO - PLoS Genetics

JF - PLoS Genetics

SN - 1553-7390

IS - 6

M1 - e1005271

ER -