TY - JOUR
T1 - A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data
AU - Zhang, Minzhe
AU - Li, Qiwei
AU - Xie, Yang
N1 - Funding Information:
The authors would like to thank Jessie Norris for helping with proofreading the manuscript. This work was partially supported by the National Institutes of Health (Nos. R01CA172211, P50CA70907, P30CA142543, R01GM115473, R01GM117597, R15GM113157, and R01CA152301), and the Cancer Prevention and Research Institute of Texas (No. RP120732). Author summary : Methylated RNA immunoprecipatation combined with RNA sequencing (MeRIP-seq), which can be viewed as a marriage of two well-studied techniques: ChIP-seq and RNA-seq, is changing the landscape of RNA epigenomics study at a higher resolution. We propose a Bayesian statistical model to identify the transcriptome methylation sites using MeRIP-seq data. Our approach includes several innovative characteristics by taking into account: (i) the high proportion of zeros in the data due to the insufficient sequencing depth; (ii) the spatial dependence of neighboring read enrichment. Compared to the existing methods, it is shown that our prediction is more consistent with the biological knowledge, and has better accuracy and spatial resolution.
Funding Information:
The authors would like to thank Jessie Norris for helping with proofreading the manuscript. This work was partially supported by the National Institutes of Health (Nos. R01CA172211, P50CA70907, P30CA142543, R01GM-115473, R01GM117597, R15GM113157, and R01CA152301), and the Cancer Prevention and Research Institute of Texas (No. RP120732).
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei.shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution. [Figure not available: see fulltext.].
AB - Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei.shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution. [Figure not available: see fulltext.].
KW - Bayesian inference
KW - MeRIP-seq data
KW - RNA epigenomics
KW - hidden Markov model
KW - zero-inflated negative binomial
UR - http://www.scopus.com/inward/record.url?scp=85053228281&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053228281&partnerID=8YFLogxK
U2 - 10.1007/s40484-018-0149-2
DO - 10.1007/s40484-018-0149-2
M3 - Article
AN - SCOPUS:85053228281
VL - 6
SP - 275
EP - 286
JO - Quantitative Biology
JF - Quantitative Biology
SN - 2095-4689
IS - 3
ER -