A novel approach to dna copy number data segmentation

Siling Wang, Yuhang Wang, Yang Xie, Guanghua Xiao

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome.

Original languageEnglish (US)
Pages (from-to)131-148
Number of pages18
JournalJournal of Bioinformatics and Computational Biology
Volume9
Issue number1
DOIs
StatePublished - Feb 2011

Fingerprint

DNA
Tumors
Neoplasms
DNA Copy Number Variations
Oligonucleotide Array Sequence Analysis
Hidden Markov models
Microarrays
Genome
Technology
Genes

Keywords

  • DCN data
  • MDL
  • model-based
  • segmentation

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Cite this

A novel approach to dna copy number data segmentation. / Wang, Siling; Wang, Yuhang; Xie, Yang; Xiao, Guanghua.

In: Journal of Bioinformatics and Computational Biology, Vol. 9, No. 1, 02.2011, p. 131-148.

Research output: Contribution to journalArticle

@article{37d928b205374e1ba78d463c8e0a2c85,
title = "A novel approach to dna copy number data segmentation",
abstract = "DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome.",
keywords = "DCN data, MDL, model-based, segmentation",
author = "Siling Wang and Yuhang Wang and Yang Xie and Guanghua Xiao",
year = "2011",
month = "2",
doi = "10.1142/S0219720011005343",
language = "English (US)",
volume = "9",
pages = "131--148",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "1",

}

TY - JOUR

T1 - A novel approach to dna copy number data segmentation

AU - Wang, Siling

AU - Wang, Yuhang

AU - Xie, Yang

AU - Xiao, Guanghua

PY - 2011/2

Y1 - 2011/2

N2 - DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome.

AB - DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome.

KW - DCN data

KW - MDL

KW - model-based

KW - segmentation

UR - http://www.scopus.com/inward/record.url?scp=79851512813&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79851512813&partnerID=8YFLogxK

U2 - 10.1142/S0219720011005343

DO - 10.1142/S0219720011005343

M3 - Article

VL - 9

SP - 131

EP - 148

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 1

ER -