Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

Cole Trapnell, Adam Roberts, Loyal Goff, Geo Pertea, Daehwan Kim, David R. Kelley, Harold Pimentel, Steven L. Salzberg, John L. Rinn, Lior Pachter

Research output: Contribution to journalArticle

5184 Citations (Scopus)

Abstract

Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

Original languageEnglish (US)
Pages (from-to)562-578
Number of pages17
JournalNature Protocols
Volume7
Issue number3
DOIs
StatePublished - Mar 1 2012

Fingerprint

Gene Expression Profiling
High-Throughput Nucleotide Sequencing
Genes
RNA
Transcriptome
Gene Expression
Software
Informatics
Experiments
Genetic Association Studies
Network protocols
Publications
Complementary DNA
Throughput
Genome
Messenger RNA
Accessories
Assays
Visualization

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. / Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R.; Pimentel, Harold; Salzberg, Steven L.; Rinn, John L.; Pachter, Lior.

In: Nature Protocols, Vol. 7, No. 3, 01.03.2012, p. 562-578.

Research output: Contribution to journalArticle

Trapnell, C, Roberts, A, Goff, L, Pertea, G, Kim, D, Kelley, DR, Pimentel, H, Salzberg, SL, Rinn, JL & Pachter, L 2012, 'Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks', Nature Protocols, vol. 7, no. 3, pp. 562-578. https://doi.org/10.1038/nprot.2012.016
Trapnell, Cole ; Roberts, Adam ; Goff, Loyal ; Pertea, Geo ; Kim, Daehwan ; Kelley, David R. ; Pimentel, Harold ; Salzberg, Steven L. ; Rinn, John L. ; Pachter, Lior. / Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. In: Nature Protocols. 2012 ; Vol. 7, No. 3. pp. 562-578.
@article{bf16c31fb09c41e3914a44c8acdd159e,
title = "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks",
abstract = "Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.",
author = "Cole Trapnell and Adam Roberts and Loyal Goff and Geo Pertea and Daehwan Kim and Kelley, {David R.} and Harold Pimentel and Salzberg, {Steven L.} and Rinn, {John L.} and Lior Pachter",
year = "2012",
month = "3",
day = "1",
doi = "10.1038/nprot.2012.016",
language = "English (US)",
volume = "7",
pages = "562--578",
journal = "Nature Protocols",
issn = "1754-2189",
publisher = "Nature Publishing Group",
number = "3",

}

TY - JOUR

T1 - Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

AU - Trapnell, Cole

AU - Roberts, Adam

AU - Goff, Loyal

AU - Pertea, Geo

AU - Kim, Daehwan

AU - Kelley, David R.

AU - Pimentel, Harold

AU - Salzberg, Steven L.

AU - Rinn, John L.

AU - Pachter, Lior

PY - 2012/3/1

Y1 - 2012/3/1

N2 - Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

AB - Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

UR - http://www.scopus.com/inward/record.url?scp=84859885816&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859885816&partnerID=8YFLogxK

U2 - 10.1038/nprot.2012.016

DO - 10.1038/nprot.2012.016

M3 - Article

VL - 7

SP - 562

EP - 578

JO - Nature Protocols

JF - Nature Protocols

SN - 1754-2189

IS - 3

ER -