StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

Mihaela Pertea, Geo M. Pertea, Corina M. Antonescu, Tsung Cheng Chang, Joshua T. Mendell, Steven L. Salzberg

Research output: Contribution to journalArticle

702 Citations (Scopus)

Abstract

Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

Original languageEnglish (US)
Pages (from-to)290-295
Number of pages6
JournalNature Biotechnology
Volume33
Issue number3
DOIs
StatePublished - 2015

Fingerprint

RNA
Transcriptome
Program assemblers
Computational methods
Blood
Software
Genes
Datasets

ASJC Scopus subject areas

  • Applied Microbiology and Biotechnology
  • Biotechnology
  • Molecular Medicine
  • Bioengineering
  • Biomedical Engineering

Cite this

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. / Pertea, Mihaela; Pertea, Geo M.; Antonescu, Corina M.; Chang, Tsung Cheng; Mendell, Joshua T.; Salzberg, Steven L.

In: Nature Biotechnology, Vol. 33, No. 3, 2015, p. 290-295.

Research output: Contribution to journalArticle

Pertea, Mihaela ; Pertea, Geo M. ; Antonescu, Corina M. ; Chang, Tsung Cheng ; Mendell, Joshua T. ; Salzberg, Steven L. / StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. In: Nature Biotechnology. 2015 ; Vol. 33, No. 3. pp. 290-295.
@article{0b48a9d9e17c48cabb3bc10584693e0a,
title = "StringTie enables improved reconstruction of a transcriptome from RNA-seq reads",
abstract = "Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53{\%} increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20{\%} more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.",
author = "Mihaela Pertea and Pertea, {Geo M.} and Antonescu, {Corina M.} and Chang, {Tsung Cheng} and Mendell, {Joshua T.} and Salzberg, {Steven L.}",
year = "2015",
doi = "10.1038/nbt.3122",
language = "English (US)",
volume = "33",
pages = "290--295",
journal = "Biotechnology",
issn = "0733-222X",
publisher = "Nature Publishing Group",
number = "3",

}

TY - JOUR

T1 - StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

AU - Pertea, Mihaela

AU - Pertea, Geo M.

AU - Antonescu, Corina M.

AU - Chang, Tsung Cheng

AU - Mendell, Joshua T.

AU - Salzberg, Steven L.

PY - 2015

Y1 - 2015

N2 - Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

AB - Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

UR - http://www.scopus.com/inward/record.url?scp=84924377038&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924377038&partnerID=8YFLogxK

U2 - 10.1038/nbt.3122

DO - 10.1038/nbt.3122

M3 - Article

VL - 33

SP - 290

EP - 295

JO - Biotechnology

JF - Biotechnology

SN - 0733-222X

IS - 3

ER -