Selecting reliable mRNA expression measurements across platforms improves downstream analysis

Pan Tong, Lixia Diao, Li Shen, Lerong Li, John Victor Heymach, Luc Girard, John D. Minna, Kevin R. Coombes, Lauren Averett Byers, Jing Wang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

With increasing use of publicly available gene expression data sets, the quality of the expression data is a critical issue for downstream analysis, gene signature development, and cross-validation of data sets. Thus, identifying reliable expression measurements by leveraging multiple mRNA expression platforms is an important analytical task. In this study, we propose a statistical framework for selecting reliable measurements between platforms by model-ing the correlations of mRNA expression levels using a beta-mixture model. The model-based selection provides an effective and objective way to separate good probes from probes with low quality, thereby improving the efficiency and accuracy of the analysis. The proposed method can be used to compare two microarray technologies or microarray and RNA sequencing measurements. We tested the approach in two matched profiling data sets, using microarray gene expression measurements from the same samples profiled on both Affymetrix and Illumina platforms. We also applied the algorithm to mRNA expression data to compare Affymetrix microarray data with RNA sequencing measurements. The algorithm successfully identified probes/genes with reliable measure-ments. Removing the unreliable measurements resulted in significant improvements for gene signature development and functional annotations.

Original languageEnglish (US)
Pages (from-to)81-89
Number of pages9
JournalCancer Informatics
Volume15
DOIs
StatePublished - 2016

Fingerprint

RNA Sequence Analysis
Messenger RNA
Genes
Gene Expression
Technology
Datasets
Data Accuracy

Keywords

  • Beta-mixture model
  • Correlation coefficients
  • Cross-validation
  • Gene expression
  • Probe selection
  • RNA sequence

ASJC Scopus subject areas

  • Cancer Research
  • Oncology

Cite this

Selecting reliable mRNA expression measurements across platforms improves downstream analysis. / Tong, Pan; Diao, Lixia; Shen, Li; Li, Lerong; Heymach, John Victor; Girard, Luc; Minna, John D.; Coombes, Kevin R.; Byers, Lauren Averett; Wang, Jing.

In: Cancer Informatics, Vol. 15, 2016, p. 81-89.

Research output: Contribution to journalArticle

Tong, P, Diao, L, Shen, L, Li, L, Heymach, JV, Girard, L, Minna, JD, Coombes, KR, Byers, LA & Wang, J 2016, 'Selecting reliable mRNA expression measurements across platforms improves downstream analysis', Cancer Informatics, vol. 15, pp. 81-89. https://doi.org/10.4137/CIN.S38590
Tong, Pan ; Diao, Lixia ; Shen, Li ; Li, Lerong ; Heymach, John Victor ; Girard, Luc ; Minna, John D. ; Coombes, Kevin R. ; Byers, Lauren Averett ; Wang, Jing. / Selecting reliable mRNA expression measurements across platforms improves downstream analysis. In: Cancer Informatics. 2016 ; Vol. 15. pp. 81-89.
@article{06a7de2be442489aa26fd89123c643ac,
title = "Selecting reliable mRNA expression measurements across platforms improves downstream analysis",
abstract = "With increasing use of publicly available gene expression data sets, the quality of the expression data is a critical issue for downstream analysis, gene signature development, and cross-validation of data sets. Thus, identifying reliable expression measurements by leveraging multiple mRNA expression platforms is an important analytical task. In this study, we propose a statistical framework for selecting reliable measurements between platforms by model-ing the correlations of mRNA expression levels using a beta-mixture model. The model-based selection provides an effective and objective way to separate good probes from probes with low quality, thereby improving the efficiency and accuracy of the analysis. The proposed method can be used to compare two microarray technologies or microarray and RNA sequencing measurements. We tested the approach in two matched profiling data sets, using microarray gene expression measurements from the same samples profiled on both Affymetrix and Illumina platforms. We also applied the algorithm to mRNA expression data to compare Affymetrix microarray data with RNA sequencing measurements. The algorithm successfully identified probes/genes with reliable measure-ments. Removing the unreliable measurements resulted in significant improvements for gene signature development and functional annotations.",
keywords = "Beta-mixture model, Correlation coefficients, Cross-validation, Gene expression, Probe selection, RNA sequence",
author = "Pan Tong and Lixia Diao and Li Shen and Lerong Li and Heymach, {John Victor} and Luc Girard and Minna, {John D.} and Coombes, {Kevin R.} and Byers, {Lauren Averett} and Jing Wang",
year = "2016",
doi = "10.4137/CIN.S38590",
language = "English (US)",
volume = "15",
pages = "81--89",
journal = "Cancer Informatics",
issn = "1176-9351",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - Selecting reliable mRNA expression measurements across platforms improves downstream analysis

AU - Tong, Pan

AU - Diao, Lixia

AU - Shen, Li

AU - Li, Lerong

AU - Heymach, John Victor

AU - Girard, Luc

AU - Minna, John D.

AU - Coombes, Kevin R.

AU - Byers, Lauren Averett

AU - Wang, Jing

PY - 2016

Y1 - 2016

N2 - With increasing use of publicly available gene expression data sets, the quality of the expression data is a critical issue for downstream analysis, gene signature development, and cross-validation of data sets. Thus, identifying reliable expression measurements by leveraging multiple mRNA expression platforms is an important analytical task. In this study, we propose a statistical framework for selecting reliable measurements between platforms by model-ing the correlations of mRNA expression levels using a beta-mixture model. The model-based selection provides an effective and objective way to separate good probes from probes with low quality, thereby improving the efficiency and accuracy of the analysis. The proposed method can be used to compare two microarray technologies or microarray and RNA sequencing measurements. We tested the approach in two matched profiling data sets, using microarray gene expression measurements from the same samples profiled on both Affymetrix and Illumina platforms. We also applied the algorithm to mRNA expression data to compare Affymetrix microarray data with RNA sequencing measurements. The algorithm successfully identified probes/genes with reliable measure-ments. Removing the unreliable measurements resulted in significant improvements for gene signature development and functional annotations.

AB - With increasing use of publicly available gene expression data sets, the quality of the expression data is a critical issue for downstream analysis, gene signature development, and cross-validation of data sets. Thus, identifying reliable expression measurements by leveraging multiple mRNA expression platforms is an important analytical task. In this study, we propose a statistical framework for selecting reliable measurements between platforms by model-ing the correlations of mRNA expression levels using a beta-mixture model. The model-based selection provides an effective and objective way to separate good probes from probes with low quality, thereby improving the efficiency and accuracy of the analysis. The proposed method can be used to compare two microarray technologies or microarray and RNA sequencing measurements. We tested the approach in two matched profiling data sets, using microarray gene expression measurements from the same samples profiled on both Affymetrix and Illumina platforms. We also applied the algorithm to mRNA expression data to compare Affymetrix microarray data with RNA sequencing measurements. The algorithm successfully identified probes/genes with reliable measure-ments. Removing the unreliable measurements resulted in significant improvements for gene signature development and functional annotations.

KW - Beta-mixture model

KW - Correlation coefficients

KW - Cross-validation

KW - Gene expression

KW - Probe selection

KW - RNA sequence

UR - http://www.scopus.com/inward/record.url?scp=84975266783&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84975266783&partnerID=8YFLogxK

U2 - 10.4137/CIN.S38590

DO - 10.4137/CIN.S38590

M3 - Article

VL - 15

SP - 81

EP - 89

JO - Cancer Informatics

JF - Cancer Informatics

SN - 1176-9351

ER -