Identification of breast cancer prognosis markers using Integrative sparse Boosting

S. Ma, J. Huang, Y. Xie, N. Yi

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Objectives: In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution. Methods: We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects. Results: Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance. Conclusions: Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.

Original languageEnglish (US)
Pages (from-to)152-161
Number of pages10
JournalMethods of Information in Medicine
Volume51
Issue number2
DOIs
StatePublished - 2012

Fingerprint

Breast Neoplasms
Sample Size
Genes
Gene Expression Profiling
Meta-Analysis
Carcinogenesis
Gene Expression
Costs and Cost Analysis
Research
Neoplasms

Keywords

  • Breast cancer prognosis
  • Gene expression
  • Integrative analysis
  • Sparse boosting

ASJC Scopus subject areas

  • Health Informatics
  • Health Information Management
  • Advanced and Specialized Nursing

Cite this

Identification of breast cancer prognosis markers using Integrative sparse Boosting. / Ma, S.; Huang, J.; Xie, Y.; Yi, N.

In: Methods of Information in Medicine, Vol. 51, No. 2, 2012, p. 152-161.

Research output: Contribution to journalArticle

@article{8278e700fd6f4bd781de2ab0205f8f26,
title = "Identification of breast cancer prognosis markers using Integrative sparse Boosting",
abstract = "Objectives: In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution. Methods: We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects. Results: Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance. Conclusions: Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.",
keywords = "Breast cancer prognosis, Gene expression, Integrative analysis, Sparse boosting",
author = "S. Ma and J. Huang and Y. Xie and N. Yi",
year = "2012",
doi = "10.3414/ME11-02-0019",
language = "English (US)",
volume = "51",
pages = "152--161",
journal = "Methods of Information in Medicine",
issn = "0026-1270",
publisher = "Schattauer GmbH",
number = "2",

}

TY - JOUR

T1 - Identification of breast cancer prognosis markers using Integrative sparse Boosting

AU - Ma, S.

AU - Huang, J.

AU - Xie, Y.

AU - Yi, N.

PY - 2012

Y1 - 2012

N2 - Objectives: In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution. Methods: We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects. Results: Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance. Conclusions: Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.

AB - Objectives: In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution. Methods: We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects. Results: Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance. Conclusions: Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.

KW - Breast cancer prognosis

KW - Gene expression

KW - Integrative analysis

KW - Sparse boosting

UR - http://www.scopus.com/inward/record.url?scp=84863338220&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863338220&partnerID=8YFLogxK

U2 - 10.3414/ME11-02-0019

DO - 10.3414/ME11-02-0019

M3 - Article

VL - 51

SP - 152

EP - 161

JO - Methods of Information in Medicine

JF - Methods of Information in Medicine

SN - 0026-1270

IS - 2

ER -