groHMM: A computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data

Minho Chae, Charles G. Danko, W. Lee Kraus

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Background: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. Results: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. Conclusions: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.

Original languageEnglish (US)
Article number222
JournalBMC Bioinformatics
Volume16
Issue number1
DOIs
StatePublished - Jul 16 2015

Fingerprint

Fibroblasts
Transcription
Lung
Cardiac Myocytes
Sequencing
Unit
RNA
Lung Neoplasms
Long Noncoding RNA
Functional Genomics
High-Throughput Nucleotide Sequencing
MicroRNA
Worm
Hidden Markov models
MicroRNAs
Breast Cancer
Transcriptome
Diptera
Markov Model
Cancer

Keywords

  • Cell type specificity
  • ChIP-seq
  • Enhancer
  • Enhancer RNAs (eRNAs)
  • Gene regulation
  • GRO-seq
  • groHMM
  • Long non-coding RNAs (lncRNAs)
  • Peak calling
  • Primary miRNAs
  • Primary transcript
  • Transcription
  • Transcription unit

ASJC Scopus subject areas

  • Applied Mathematics
  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Cite this

groHMM : A computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. / Chae, Minho; Danko, Charles G.; Kraus, W. Lee.

In: BMC Bioinformatics, Vol. 16, No. 1, 222, 16.07.2015.

Research output: Contribution to journalArticle

@article{98f7d09831754794af1e1cb5028ad6ad,
title = "groHMM: A computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data",
abstract = "Background: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. Results: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. Conclusions: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.",
keywords = "Cell type specificity, ChIP-seq, Enhancer, Enhancer RNAs (eRNAs), Gene regulation, GRO-seq, groHMM, Long non-coding RNAs (lncRNAs), Peak calling, Primary miRNAs, Primary transcript, Transcription, Transcription unit",
author = "Minho Chae and Danko, {Charles G.} and Kraus, {W. Lee}",
year = "2015",
month = "7",
day = "16",
doi = "10.1186/s12859-015-0656-3",
language = "English (US)",
volume = "16",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - groHMM

T2 - A computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data

AU - Chae, Minho

AU - Danko, Charles G.

AU - Kraus, W. Lee

PY - 2015/7/16

Y1 - 2015/7/16

N2 - Background: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. Results: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. Conclusions: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.

AB - Background: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. Results: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. Conclusions: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.

KW - Cell type specificity

KW - ChIP-seq

KW - Enhancer

KW - Enhancer RNAs (eRNAs)

KW - Gene regulation

KW - GRO-seq

KW - groHMM

KW - Long non-coding RNAs (lncRNAs)

KW - Peak calling

KW - Primary miRNAs

KW - Primary transcript

KW - Transcription

KW - Transcription unit

UR - http://www.scopus.com/inward/record.url?scp=84937042661&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937042661&partnerID=8YFLogxK

U2 - 10.1186/s12859-015-0656-3

DO - 10.1186/s12859-015-0656-3

M3 - Article

C2 - 26173492

AN - SCOPUS:84937010731

VL - 16

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 222

ER -