A Bayesian extension of the hypergeometric test for functional enrichment analysis

Jing Cao, Song Zhang

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Functional enrichment analysis is conducted on high-throughput data to provide functional interpretation for a list of genes or proteins that share a common property, such as being differentially expressed (DE). The hypergeometric P-value has been widely used to investigate whether genes from pre-defined functional terms, for example, Gene Ontology (GO), are enriched in the DE genes. The hypergeometric P-value has three limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms; (3) repeated use of information due to overlapping annotations by the true-path rule. We propose a Bayesian approach based on the non-central hypergeometric model. The GO dependence structure is incorporated through a prior on non-centrality parameters. The likelihood function does not include overlapping information. The inference about enrichment is based on posterior probabilities that do not have a size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. We also describe the basic ideas of assumption and implementation of different methods to provide some theoretical insights, which are demonstrated via a simulation study. A real application is presented.

Original languageEnglish (US)
Pages (from-to)84-94
Number of pages11
JournalBiometrics
Volume70
Issue number1
DOIs
StatePublished - 2014

Fingerprint

Gene Ontology
Genes
Likelihood Functions
Bayes Theorem
Term
Gene
genes
testing
Ontology
Overlapping
Noncentrality Parameter
Dependence Structure
Posterior Probability
Likelihood Function
Proteins
Bayesian Approach
High Throughput
Annotation
Throughput
Simulation Study

Keywords

  • Functional enrichment analysis
  • Gene ontology
  • Hypergeometric P-value
  • Modular enrichment analysis
  • Non-central hypergeometric distribution

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Medicine(all)

Cite this

A Bayesian extension of the hypergeometric test for functional enrichment analysis. / Cao, Jing; Zhang, Song.

In: Biometrics, Vol. 70, No. 1, 2014, p. 84-94.

Research output: Contribution to journalArticle

@article{40e0025c92f845f29cbdbd86aea00c1f,
title = "A Bayesian extension of the hypergeometric test for functional enrichment analysis",
abstract = "Functional enrichment analysis is conducted on high-throughput data to provide functional interpretation for a list of genes or proteins that share a common property, such as being differentially expressed (DE). The hypergeometric P-value has been widely used to investigate whether genes from pre-defined functional terms, for example, Gene Ontology (GO), are enriched in the DE genes. The hypergeometric P-value has three limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms; (3) repeated use of information due to overlapping annotations by the true-path rule. We propose a Bayesian approach based on the non-central hypergeometric model. The GO dependence structure is incorporated through a prior on non-centrality parameters. The likelihood function does not include overlapping information. The inference about enrichment is based on posterior probabilities that do not have a size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. We also describe the basic ideas of assumption and implementation of different methods to provide some theoretical insights, which are demonstrated via a simulation study. A real application is presented.",
keywords = "Functional enrichment analysis, Gene ontology, Hypergeometric P-value, Modular enrichment analysis, Non-central hypergeometric distribution",
author = "Jing Cao and Song Zhang",
year = "2014",
doi = "10.1111/biom.12122",
language = "English (US)",
volume = "70",
pages = "84--94",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - A Bayesian extension of the hypergeometric test for functional enrichment analysis

AU - Cao, Jing

AU - Zhang, Song

PY - 2014

Y1 - 2014

N2 - Functional enrichment analysis is conducted on high-throughput data to provide functional interpretation for a list of genes or proteins that share a common property, such as being differentially expressed (DE). The hypergeometric P-value has been widely used to investigate whether genes from pre-defined functional terms, for example, Gene Ontology (GO), are enriched in the DE genes. The hypergeometric P-value has three limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms; (3) repeated use of information due to overlapping annotations by the true-path rule. We propose a Bayesian approach based on the non-central hypergeometric model. The GO dependence structure is incorporated through a prior on non-centrality parameters. The likelihood function does not include overlapping information. The inference about enrichment is based on posterior probabilities that do not have a size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. We also describe the basic ideas of assumption and implementation of different methods to provide some theoretical insights, which are demonstrated via a simulation study. A real application is presented.

AB - Functional enrichment analysis is conducted on high-throughput data to provide functional interpretation for a list of genes or proteins that share a common property, such as being differentially expressed (DE). The hypergeometric P-value has been widely used to investigate whether genes from pre-defined functional terms, for example, Gene Ontology (GO), are enriched in the DE genes. The hypergeometric P-value has three limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms; (3) repeated use of information due to overlapping annotations by the true-path rule. We propose a Bayesian approach based on the non-central hypergeometric model. The GO dependence structure is incorporated through a prior on non-centrality parameters. The likelihood function does not include overlapping information. The inference about enrichment is based on posterior probabilities that do not have a size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. We also describe the basic ideas of assumption and implementation of different methods to provide some theoretical insights, which are demonstrated via a simulation study. A real application is presented.

KW - Functional enrichment analysis

KW - Gene ontology

KW - Hypergeometric P-value

KW - Modular enrichment analysis

KW - Non-central hypergeometric distribution

UR - http://www.scopus.com/inward/record.url?scp=84895877065&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895877065&partnerID=8YFLogxK

U2 - 10.1111/biom.12122

DO - 10.1111/biom.12122

M3 - Article

C2 - 24320951

AN - SCOPUS:84895877065

VL - 70

SP - 84

EP - 94

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -