CIPHER

A flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction

Carlos Guzman, Iván D'Orso

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies. Results: We have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER's pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an "analysis" mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together. Conclusions: Using public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS datasets commonly used in genome biology studies. Additionally, CIPHER's integrative "analysis" mode allows researchers to elicit important biological information from the combined dataset analysis.

Original languageEnglish (US)
Article number363
JournalBMC Bioinformatics
Volume18
Issue number1
DOIs
StatePublished - Aug 8 2017

Fingerprint

Deoxyribonucleases
Workflow
Sequencing
Work Flow
Genomics
Data analysis
RNA
Prediction
Trimming
Nucleosomes
Transcription
Bioinformatics
Gene expression
Software packages
Quality Control
Chromatin
Quality control
Containers
Contamination
Visualization

Keywords

  • ATAC-seq
  • ChIP-seq
  • Chromatin states
  • DNase-seq
  • Enhancers
  • Gene regulation
  • GRO-seq
  • Machine-learning
  • MNase-seq
  • Next-generation sequencing
  • Pipeline
  • Prediction
  • RNA-seq
  • Transcription
  • Workflow

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{747efc58ba93423b899b250db4335533,
title = "CIPHER: A flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction",
abstract = "Background: Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies. Results: We have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER's pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an {"}analysis{"} mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together. Conclusions: Using public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS datasets commonly used in genome biology studies. Additionally, CIPHER's integrative {"}analysis{"} mode allows researchers to elicit important biological information from the combined dataset analysis.",
keywords = "ATAC-seq, ChIP-seq, Chromatin states, DNase-seq, Enhancers, Gene regulation, GRO-seq, Machine-learning, MNase-seq, Next-generation sequencing, Pipeline, Prediction, RNA-seq, Transcription, Workflow",
author = "Carlos Guzman and Iv{\'a}n D'Orso",
year = "2017",
month = "8",
day = "8",
doi = "10.1186/s12859-017-1770-1",
language = "English (US)",
volume = "18",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - CIPHER

T2 - A flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction

AU - Guzman, Carlos

AU - D'Orso, Iván

PY - 2017/8/8

Y1 - 2017/8/8

N2 - Background: Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies. Results: We have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER's pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an "analysis" mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together. Conclusions: Using public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS datasets commonly used in genome biology studies. Additionally, CIPHER's integrative "analysis" mode allows researchers to elicit important biological information from the combined dataset analysis.

AB - Background: Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies. Results: We have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER's pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an "analysis" mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together. Conclusions: Using public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS datasets commonly used in genome biology studies. Additionally, CIPHER's integrative "analysis" mode allows researchers to elicit important biological information from the combined dataset analysis.

KW - ATAC-seq

KW - ChIP-seq

KW - Chromatin states

KW - DNase-seq

KW - Enhancers

KW - Gene regulation

KW - GRO-seq

KW - Machine-learning

KW - MNase-seq

KW - Next-generation sequencing

KW - Pipeline

KW - Prediction

KW - RNA-seq

KW - Transcription

KW - Workflow

UR - http://www.scopus.com/inward/record.url?scp=85027194730&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027194730&partnerID=8YFLogxK

U2 - 10.1186/s12859-017-1770-1

DO - 10.1186/s12859-017-1770-1

M3 - Article

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 363

ER -