Bayesian modeling of microbiome data for differential abundance analysis

Qiwei Li, Shuang Jiang, Andrew Y. Koh, Guanghua Xiao, Xiaowei Zhan

Research output: Contribution to journalArticlepeer-review


Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the study of the microbiome. Recently, there is a rise of numerous studies that aim to decipher the relationship between the microbiome and disease. One of the most essential questions is to identify differentially abundant taxonomic features across different populations (such as cases and controls). Microbiome count data are high-dimensional and usually suffer from uneven sampling depth, over-dispersion, and zero-in ation. These characteristics often hamper downstream analysis and thus require specialized analytical models. Here we propose a general Bayesian framework to model microbiome count data for differential abundance analysis. This framework is composed of two hierarchical levels. The bottom level is a multivariate count-generating process from multiple choices. We particularly focus on the choice of a zero-in ated negative binomial (ZINB) model that takes into account the skewness and excess zeros in the microbiome data and incorporates model-based normalization through prior distributions with stochastic constraints. The top level is a mixture of Gaussian distributions with a feature selection scheme, which enables us to identify a set of differentially abundant taxa. In addition, the model allows us to incorporate phylogenetic tree information into the framework via the use of Markov random field priors. All the parameters are simultaneously inferred by using Markov chain Monte Carlo sampling techniques. Comprehensive simulation studies are conducted to evaluate our method and compared it with alternative approaches. Applications of the proposed method to two real microbiome datasets show that our method is able to detect a set of differentially abundant taxa at different taxonomic ranks, most of which have been experimentally verified. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies and elucidating disease etiology.

Original languageEnglish (US)
JournalUnknown Journal
StatePublished - Feb 23 2019


  • Differential abundant analysis
  • Dirichlet process
  • Feature selection
  • High-dimensional count data
  • Microbiome
  • Mixture models
  • Zero- inflated negative binomial

ASJC Scopus subject areas

  • General

Fingerprint Dive into the research topics of 'Bayesian modeling of microbiome data for differential abundance analysis'. Together they form a unique fingerprint.

Cite this