Bayesian hidden Markov models to identify RNA-protein interaction sites in PAR-CLIP

Jonghyun Yun, Tao Wang, Guanghua Xiao

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

The photoactivatable ribonucleoside enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. There are two key features of the PAR-CLIP experiments: The sequence read tags are likely to form an enriched peak around each RNA-protein interaction site; and the cross-linking procedure is likely to introduce a specific mutation in each sequence read tag at the interaction site. Several ad hoc methods have been developed to identify the RNA-protein interaction sites using either sequence read counts or mutation counts alone however, rigorous statistical methods for analyzing PAR-CLIP are still lacking. In this article, we propose an integrative model to establish a joint distribution of observed read and mutation counts. To pinpoint the interaction sites at single base-pair resolution, we developed a novel modeling approach that adopts non-homogeneous hidden Markov models to incorporate the nucleotide sequence at each genomic location. Both simulation studies and data application showed that our method outperforms the ad hoc methods, and provides reliable inferences for the RNA-protein binding sites from PAR-CLIP data.

Original languageEnglish (US)
Pages (from-to)430-440
Number of pages11
JournalBiometrics
Volume70
Issue number2
DOIs
StatePublished - 2014

Fingerprint

ribonucleosides
Ribonucleosides
Hidden Markov models
RNA
crosslinking
Immunoprecipitation
Markov Model
Linking
Proteins
Protein
Count
Interaction
Mutation
mutation
proteins
Protein Interaction Mapping
Likely
Binding sites
Nucleotides
Statistical methods

Keywords

  • Beta geometric
  • Markov chain Monte Carlo
  • Next generation sequencing data
  • Non-homogeneous hidden Markov model
  • PAR-CLIP
  • RNA binding protein

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Medicine(all)

Cite this

Bayesian hidden Markov models to identify RNA-protein interaction sites in PAR-CLIP. / Yun, Jonghyun; Wang, Tao; Xiao, Guanghua.

In: Biometrics, Vol. 70, No. 2, 2014, p. 430-440.

Research output: Contribution to journalArticle

@article{4fd79bce5d074d4eae38e328cfeeea84,
title = "Bayesian hidden Markov models to identify RNA-protein interaction sites in PAR-CLIP",
abstract = "The photoactivatable ribonucleoside enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. There are two key features of the PAR-CLIP experiments: The sequence read tags are likely to form an enriched peak around each RNA-protein interaction site; and the cross-linking procedure is likely to introduce a specific mutation in each sequence read tag at the interaction site. Several ad hoc methods have been developed to identify the RNA-protein interaction sites using either sequence read counts or mutation counts alone however, rigorous statistical methods for analyzing PAR-CLIP are still lacking. In this article, we propose an integrative model to establish a joint distribution of observed read and mutation counts. To pinpoint the interaction sites at single base-pair resolution, we developed a novel modeling approach that adopts non-homogeneous hidden Markov models to incorporate the nucleotide sequence at each genomic location. Both simulation studies and data application showed that our method outperforms the ad hoc methods, and provides reliable inferences for the RNA-protein binding sites from PAR-CLIP data.",
keywords = "Beta geometric, Markov chain Monte Carlo, Next generation sequencing data, Non-homogeneous hidden Markov model, PAR-CLIP, RNA binding protein",
author = "Jonghyun Yun and Tao Wang and Guanghua Xiao",
year = "2014",
doi = "10.1111/biom.12147",
language = "English (US)",
volume = "70",
pages = "430--440",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "2",

}

TY - JOUR

T1 - Bayesian hidden Markov models to identify RNA-protein interaction sites in PAR-CLIP

AU - Yun, Jonghyun

AU - Wang, Tao

AU - Xiao, Guanghua

PY - 2014

Y1 - 2014

N2 - The photoactivatable ribonucleoside enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. There are two key features of the PAR-CLIP experiments: The sequence read tags are likely to form an enriched peak around each RNA-protein interaction site; and the cross-linking procedure is likely to introduce a specific mutation in each sequence read tag at the interaction site. Several ad hoc methods have been developed to identify the RNA-protein interaction sites using either sequence read counts or mutation counts alone however, rigorous statistical methods for analyzing PAR-CLIP are still lacking. In this article, we propose an integrative model to establish a joint distribution of observed read and mutation counts. To pinpoint the interaction sites at single base-pair resolution, we developed a novel modeling approach that adopts non-homogeneous hidden Markov models to incorporate the nucleotide sequence at each genomic location. Both simulation studies and data application showed that our method outperforms the ad hoc methods, and provides reliable inferences for the RNA-protein binding sites from PAR-CLIP data.

AB - The photoactivatable ribonucleoside enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. There are two key features of the PAR-CLIP experiments: The sequence read tags are likely to form an enriched peak around each RNA-protein interaction site; and the cross-linking procedure is likely to introduce a specific mutation in each sequence read tag at the interaction site. Several ad hoc methods have been developed to identify the RNA-protein interaction sites using either sequence read counts or mutation counts alone however, rigorous statistical methods for analyzing PAR-CLIP are still lacking. In this article, we propose an integrative model to establish a joint distribution of observed read and mutation counts. To pinpoint the interaction sites at single base-pair resolution, we developed a novel modeling approach that adopts non-homogeneous hidden Markov models to incorporate the nucleotide sequence at each genomic location. Both simulation studies and data application showed that our method outperforms the ad hoc methods, and provides reliable inferences for the RNA-protein binding sites from PAR-CLIP data.

KW - Beta geometric

KW - Markov chain Monte Carlo

KW - Next generation sequencing data

KW - Non-homogeneous hidden Markov model

KW - PAR-CLIP

KW - RNA binding protein

UR - http://www.scopus.com/inward/record.url?scp=84902346402&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902346402&partnerID=8YFLogxK

U2 - 10.1111/biom.12147

DO - 10.1111/biom.12147

M3 - Article

VL - 70

SP - 430

EP - 440

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 2

ER -