Rcrnorm: An integrated system of random-coefficient hierarchical regression models for normalizing nanostring ncounter data

Gaoxiang Jia, Xinlei Wang, Qiwei Li, W. E.I. Lu, Ximing Tang, Ignacio Wistuba, Yang Xie

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity which may greatly facilitate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far behind. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. Thus, these methods are not optimized for the nCounter system, not mentioning that they were not developed for FFPE samples. We construct an integrated system of random-coefficient hierarchical regression models to capture main patterns and characteristics observed from NanoString data of FFPE samples and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design and simultaneously removes biases from various sources. It eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Furthermore, it is applicable to freshly frozen or like samples that can be generally viewed as a reduced case of FFPE samples. Simulation and applications showed the superior performance of RCRnorm.

Original languageEnglish (US)
Pages (from-to)1617-1647
Number of pages31
JournalAnnals of Applied Statistics
Volume13
Issue number3
DOIs
StatePublished - Sep 2019

Fingerprint

Random Coefficients
Hierarchical Model
Integrated System
Formaldehyde
Paraffins
Regression Model
Gene expression
Gene Expression
Genes
Profiling
Information use
Biomarkers
RNA
Design of experiments
Gene
Hierarchical regression
Random coefficients
Integrated system
Regression model
Normalize

Keywords

  • Bayesian hierarchical modeling
  • Control probes
  • FFPE
  • Housekeeping gene
  • Normalization
  • Random coefficients regression

ASJC Scopus subject areas

  • Statistics and Probability
  • Modeling and Simulation
  • Statistics, Probability and Uncertainty

Cite this

Rcrnorm : An integrated system of random-coefficient hierarchical regression models for normalizing nanostring ncounter data. / Jia, Gaoxiang; Wang, Xinlei; Li, Qiwei; Lu, W. E.I.; Tang, Ximing; Wistuba, Ignacio; Xie, Yang.

In: Annals of Applied Statistics, Vol. 13, No. 3, 09.2019, p. 1617-1647.

Research output: Contribution to journalArticle

Jia, Gaoxiang ; Wang, Xinlei ; Li, Qiwei ; Lu, W. E.I. ; Tang, Ximing ; Wistuba, Ignacio ; Xie, Yang. / Rcrnorm : An integrated system of random-coefficient hierarchical regression models for normalizing nanostring ncounter data. In: Annals of Applied Statistics. 2019 ; Vol. 13, No. 3. pp. 1617-1647.
@article{1fe82534829c41bab6eee66079287d7d,
title = "Rcrnorm: An integrated system of random-coefficient hierarchical regression models for normalizing nanostring ncounter data",
abstract = "Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity which may greatly facilitate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far behind. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. Thus, these methods are not optimized for the nCounter system, not mentioning that they were not developed for FFPE samples. We construct an integrated system of random-coefficient hierarchical regression models to capture main patterns and characteristics observed from NanoString data of FFPE samples and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design and simultaneously removes biases from various sources. It eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Furthermore, it is applicable to freshly frozen or like samples that can be generally viewed as a reduced case of FFPE samples. Simulation and applications showed the superior performance of RCRnorm.",
keywords = "Bayesian hierarchical modeling, Control probes, FFPE, Housekeeping gene, Normalization, Random coefficients regression",
author = "Gaoxiang Jia and Xinlei Wang and Qiwei Li and Lu, {W. E.I.} and Ximing Tang and Ignacio Wistuba and Yang Xie",
year = "2019",
month = "9",
doi = "10.1214/19-AOAS1249",
language = "English (US)",
volume = "13",
pages = "1617--1647",
journal = "Annals of Applied Statistics",
issn = "1932-6157",
publisher = "Institute of Mathematical Statistics",
number = "3",

}

TY - JOUR

T1 - Rcrnorm

T2 - An integrated system of random-coefficient hierarchical regression models for normalizing nanostring ncounter data

AU - Jia, Gaoxiang

AU - Wang, Xinlei

AU - Li, Qiwei

AU - Lu, W. E.I.

AU - Tang, Ximing

AU - Wistuba, Ignacio

AU - Xie, Yang

PY - 2019/9

Y1 - 2019/9

N2 - Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity which may greatly facilitate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far behind. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. Thus, these methods are not optimized for the nCounter system, not mentioning that they were not developed for FFPE samples. We construct an integrated system of random-coefficient hierarchical regression models to capture main patterns and characteristics observed from NanoString data of FFPE samples and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design and simultaneously removes biases from various sources. It eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Furthermore, it is applicable to freshly frozen or like samples that can be generally viewed as a reduced case of FFPE samples. Simulation and applications showed the superior performance of RCRnorm.

AB - Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity which may greatly facilitate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far behind. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. Thus, these methods are not optimized for the nCounter system, not mentioning that they were not developed for FFPE samples. We construct an integrated system of random-coefficient hierarchical regression models to capture main patterns and characteristics observed from NanoString data of FFPE samples and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design and simultaneously removes biases from various sources. It eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Furthermore, it is applicable to freshly frozen or like samples that can be generally viewed as a reduced case of FFPE samples. Simulation and applications showed the superior performance of RCRnorm.

KW - Bayesian hierarchical modeling

KW - Control probes

KW - FFPE

KW - Housekeeping gene

KW - Normalization

KW - Random coefficients regression

UR - http://www.scopus.com/inward/record.url?scp=85073750799&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073750799&partnerID=8YFLogxK

U2 - 10.1214/19-AOAS1249

DO - 10.1214/19-AOAS1249

M3 - Article

AN - SCOPUS:85073750799

VL - 13

SP - 1617

EP - 1647

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 3

ER -