SHEAR: Sample heterogeneity estimation and assembly by reference

Sean R. Landman, Tae H. Hwang, Kevin A T Silverstein, Yingming Li, Scott M. Dehm, Michael Steinbach, Vipin Kumar

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. Results: By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.Conclusion: SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

Original languageEnglish (US)
Article number84
JournalBMC Genomics
Volume15
Issue number1
DOIs
StatePublished - Jan 29 2014

Fingerprint

Genome
Tumor Cell Line
Neoplasms
RNA
Lead

Keywords

  • Assembly
  • Genomics
  • Heterogeneity
  • Next-generation sequencing
  • Personal genome
  • Prostate cancer
  • Sequence analysis
  • Structural variation

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Medicine(all)

Cite this

Landman, S. R., Hwang, T. H., Silverstein, K. A. T., Li, Y., Dehm, S. M., Steinbach, M., & Kumar, V. (2014). SHEAR: Sample heterogeneity estimation and assembly by reference. BMC Genomics, 15(1), [84]. https://doi.org/10.1186/1471-2164-15-84

SHEAR : Sample heterogeneity estimation and assembly by reference. / Landman, Sean R.; Hwang, Tae H.; Silverstein, Kevin A T; Li, Yingming; Dehm, Scott M.; Steinbach, Michael; Kumar, Vipin.

In: BMC Genomics, Vol. 15, No. 1, 84, 29.01.2014.

Research output: Contribution to journalArticle

Landman, SR, Hwang, TH, Silverstein, KAT, Li, Y, Dehm, SM, Steinbach, M & Kumar, V 2014, 'SHEAR: Sample heterogeneity estimation and assembly by reference', BMC Genomics, vol. 15, no. 1, 84. https://doi.org/10.1186/1471-2164-15-84
Landman SR, Hwang TH, Silverstein KAT, Li Y, Dehm SM, Steinbach M et al. SHEAR: Sample heterogeneity estimation and assembly by reference. BMC Genomics. 2014 Jan 29;15(1). 84. https://doi.org/10.1186/1471-2164-15-84
Landman, Sean R. ; Hwang, Tae H. ; Silverstein, Kevin A T ; Li, Yingming ; Dehm, Scott M. ; Steinbach, Michael ; Kumar, Vipin. / SHEAR : Sample heterogeneity estimation and assembly by reference. In: BMC Genomics. 2014 ; Vol. 15, No. 1.
@article{373ae811de96421b8a6dff532f1a892a,
title = "SHEAR: Sample heterogeneity estimation and assembly by reference",
abstract = "Background: Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. Results: By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.Conclusion: SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.",
keywords = "Assembly, Genomics, Heterogeneity, Next-generation sequencing, Personal genome, Prostate cancer, Sequence analysis, Structural variation",
author = "Landman, {Sean R.} and Hwang, {Tae H.} and Silverstein, {Kevin A T} and Yingming Li and Dehm, {Scott M.} and Michael Steinbach and Vipin Kumar",
year = "2014",
month = "1",
day = "29",
doi = "10.1186/1471-2164-15-84",
language = "English (US)",
volume = "15",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - SHEAR

T2 - Sample heterogeneity estimation and assembly by reference

AU - Landman, Sean R.

AU - Hwang, Tae H.

AU - Silverstein, Kevin A T

AU - Li, Yingming

AU - Dehm, Scott M.

AU - Steinbach, Michael

AU - Kumar, Vipin

PY - 2014/1/29

Y1 - 2014/1/29

N2 - Background: Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. Results: By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.Conclusion: SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

AB - Background: Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. Results: By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.Conclusion: SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

KW - Assembly

KW - Genomics

KW - Heterogeneity

KW - Next-generation sequencing

KW - Personal genome

KW - Prostate cancer

KW - Sequence analysis

KW - Structural variation

UR - http://www.scopus.com/inward/record.url?scp=84892964378&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892964378&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-15-84

DO - 10.1186/1471-2164-15-84

M3 - Article

C2 - 24476358

AN - SCOPUS:84892964378

VL - 15

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 84

ER -