Co-clustering phenome-genome for phenotype classification and disease gene discovery

Taehyun Hwang, Gowtham Atluri, Maoqiang Xie, Sanjoy Dey, Changjin Hong, Vipin Kumar, Rui Kuang

Research output: Contribution to journalArticle

40 Citations (Scopus)

Abstract

Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.

Original languageEnglish (US)
JournalNucleic Acids Research
Volume40
Issue number19
DOIs
StatePublished - Oct 2012

Fingerprint

Genetic Association Studies
Cluster Analysis
Genome
Phenotype
Genes
Multigene Family
Genetic Databases
Protein Interaction Maps
Proteins

ASJC Scopus subject areas

  • Genetics

Cite this

Co-clustering phenome-genome for phenotype classification and disease gene discovery. / Hwang, Taehyun; Atluri, Gowtham; Xie, Maoqiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui.

In: Nucleic Acids Research, Vol. 40, No. 19, 10.2012.

Research output: Contribution to journalArticle

Hwang, Taehyun ; Atluri, Gowtham ; Xie, Maoqiang ; Dey, Sanjoy ; Hong, Changjin ; Kumar, Vipin ; Kuang, Rui. / Co-clustering phenome-genome for phenotype classification and disease gene discovery. In: Nucleic Acids Research. 2012 ; Vol. 40, No. 19.
@article{61431ed0e75d4c8390bd08ffae76f848,
title = "Co-clustering phenome-genome for phenotype classification and disease gene discovery",
abstract = "Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.",
author = "Taehyun Hwang and Gowtham Atluri and Maoqiang Xie and Sanjoy Dey and Changjin Hong and Vipin Kumar and Rui Kuang",
year = "2012",
month = "10",
doi = "10.1093/nar/gks615",
language = "English (US)",
volume = "40",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "19",

}

TY - JOUR

T1 - Co-clustering phenome-genome for phenotype classification and disease gene discovery

AU - Hwang, Taehyun

AU - Atluri, Gowtham

AU - Xie, Maoqiang

AU - Dey, Sanjoy

AU - Hong, Changjin

AU - Kumar, Vipin

AU - Kuang, Rui

PY - 2012/10

Y1 - 2012/10

N2 - Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.

AB - Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.

UR - http://www.scopus.com/inward/record.url?scp=84867652286&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867652286&partnerID=8YFLogxK

U2 - 10.1093/nar/gks615

DO - 10.1093/nar/gks615

M3 - Article

VL - 40

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 19

ER -