MIXnorm: Normalizing RNA-seq data from formalin-fixed paraffin-embedded samples

Shen Yin, Xinlei Wang, Gaoxiang Jia, Yang Xie

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Motivation: Recent studies have shown that RNA-sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from formalin-fixed paraffin-embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data, a key step that adjusts for unwanted biological and technical effects that can bias the signal of interest. Existing methods, developed based on fresh-frozen or similar-type samples, may cause suboptimal performance. Results: We proposed a new normalization method, labeled MIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration. By eliminating the need for numerical optimization in the M-step, the algorithm is easy to implement and computationally efficient. We evaluated MIXnorm through simulations and cancer studies. MIXnorm makes a significant improvement over commonly used methods for RNA-seq expression data.

Original languageEnglish (US)
Pages (from-to)3401-3408
Number of pages8
JournalBioinformatics
Volume36
Issue number11
DOIs
StatePublished - Jun 1 2020

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'MIXnorm: Normalizing RNA-seq data from formalin-fixed paraffin-embedded samples'. Together they form a unique fingerprint.

Cite this