Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

Mihaela Pertea, Daehwan Kim, Geo M. Pertea, Jeffrey T. Leek, Steven L. Salzberg

Research output: Contribution to journalArticle

498 Citations (Scopus)

Abstract

High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

Original languageEnglish (US)
Pages (from-to)1650-1667
Number of pages18
JournalNature Protocols
Volume11
Issue number9
DOIs
StatePublished - Sep 1 2016

Fingerprint

Gene Expression Profiling
Software
Genes
RNA
Gene expression
Gene Expression
Network protocols
High-Throughput Nucleotide Sequencing
Experiments
Throughput
Genome
Messenger RNA

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. / Pertea, Mihaela; Kim, Daehwan; Pertea, Geo M.; Leek, Jeffrey T.; Salzberg, Steven L.

In: Nature Protocols, Vol. 11, No. 9, 01.09.2016, p. 1650-1667.

Research output: Contribution to journalArticle

Pertea, Mihaela ; Kim, Daehwan ; Pertea, Geo M. ; Leek, Jeffrey T. ; Salzberg, Steven L. / Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. In: Nature Protocols. 2016 ; Vol. 11, No. 9. pp. 1650-1667.
@article{f4d5c338e90e437c961c65d0e1954835,
title = "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown",
abstract = "High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.",
author = "Mihaela Pertea and Daehwan Kim and Pertea, {Geo M.} and Leek, {Jeffrey T.} and Salzberg, {Steven L.}",
year = "2016",
month = "9",
day = "1",
doi = "10.1038/nprot.2016.095",
language = "English (US)",
volume = "11",
pages = "1650--1667",
journal = "Nature Protocols",
issn = "1754-2189",
publisher = "Nature Publishing Group",
number = "9",

}

TY - JOUR

T1 - Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

AU - Pertea, Mihaela

AU - Kim, Daehwan

AU - Pertea, Geo M.

AU - Leek, Jeffrey T.

AU - Salzberg, Steven L.

PY - 2016/9/1

Y1 - 2016/9/1

N2 - High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

AB - High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

UR - http://www.scopus.com/inward/record.url?scp=84990992834&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990992834&partnerID=8YFLogxK

U2 - 10.1038/nprot.2016.095

DO - 10.1038/nprot.2016.095

M3 - Article

C2 - 27560171

AN - SCOPUS:84990992834

VL - 11

SP - 1650

EP - 1667

JO - Nature Protocols

JF - Nature Protocols

SN - 1754-2189

IS - 9

ER -