Sparse group penalized integrative analysis of multiple cancer prognosis datasets

Jin Liu, Jian Huang, Yang Xie, Shuangge Ma

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Owing to the 'large d, small n' characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyses multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the accelerated failure time model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group minimax concave penalty approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach.

Original languageEnglish (US)
Pages (from-to)68-77
Number of pages10
JournalGenetics Research
Volume95
Issue number2-3
DOIs
StatePublished - Jun 2013

Fingerprint

Neoplasms
Patient Selection
Datasets
Meta-Analysis
Costs and Cost Analysis
Research

ASJC Scopus subject areas

  • Genetics

Cite this

Sparse group penalized integrative analysis of multiple cancer prognosis datasets. / Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge.

In: Genetics Research, Vol. 95, No. 2-3, 06.2013, p. 68-77.

Research output: Contribution to journalArticle

Liu, Jin ; Huang, Jian ; Xie, Yang ; Ma, Shuangge. / Sparse group penalized integrative analysis of multiple cancer prognosis datasets. In: Genetics Research. 2013 ; Vol. 95, No. 2-3. pp. 68-77.
@article{99c9f3032ac24e09b051290c75f1ebb6,
title = "Sparse group penalized integrative analysis of multiple cancer prognosis datasets",
abstract = "In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Owing to the 'large d, small n' characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyses multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the accelerated failure time model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group minimax concave penalty approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach.",
author = "Jin Liu and Jian Huang and Yang Xie and Shuangge Ma",
year = "2013",
month = "6",
doi = "10.1017/S0016672313000086",
language = "English (US)",
volume = "95",
pages = "68--77",
journal = "Genetics Research",
issn = "0016-6723",
publisher = "Cambridge University Press",
number = "2-3",

}

TY - JOUR

T1 - Sparse group penalized integrative analysis of multiple cancer prognosis datasets

AU - Liu, Jin

AU - Huang, Jian

AU - Xie, Yang

AU - Ma, Shuangge

PY - 2013/6

Y1 - 2013/6

N2 - In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Owing to the 'large d, small n' characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyses multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the accelerated failure time model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group minimax concave penalty approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach.

AB - In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Owing to the 'large d, small n' characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyses multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the accelerated failure time model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group minimax concave penalty approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=84882273198&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84882273198&partnerID=8YFLogxK

U2 - 10.1017/S0016672313000086

DO - 10.1017/S0016672313000086

M3 - Article

VL - 95

SP - 68

EP - 77

JO - Genetics Research

JF - Genetics Research

SN - 0016-6723

IS - 2-3

ER -