Transfer learning across ontologies for phenome-genome association prediction

Raphael Petegrosso, Sunho Park, Tae Hyun Hwang, Rui Kuang

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Motivation: To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype-gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term-gene associations and HPO phenotype-gene associations for all the genes in a protein-protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results: In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein-protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term-gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions.

Original languageEnglish (US)
Pages (from-to)529-536
Number of pages8
JournalBioinformatics
Volume33
Issue number4
DOIs
StatePublished - Feb 15 2017

Fingerprint

Transfer Learning
Phenotype
Ontology
Genome
Genes
Prediction
Gene Ontology
Gene
Proteins
Labels
Protein Interaction Maps
Protein Interaction Networks
Transfer (Psychology)
Protein-protein Interaction
Propagation
Path
Term
Ontology Learning
Predict
Snapshot

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Transfer learning across ontologies for phenome-genome association prediction. / Petegrosso, Raphael; Park, Sunho; Hwang, Tae Hyun; Kuang, Rui.

In: Bioinformatics, Vol. 33, No. 4, 15.02.2017, p. 529-536.

Research output: Contribution to journalArticle

Petegrosso, Raphael ; Park, Sunho ; Hwang, Tae Hyun ; Kuang, Rui. / Transfer learning across ontologies for phenome-genome association prediction. In: Bioinformatics. 2017 ; Vol. 33, No. 4. pp. 529-536.
@article{b976fa3c1e19483881fbf61840265787,
title = "Transfer learning across ontologies for phenome-genome association prediction",
abstract = "Motivation: To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype-gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term-gene associations and HPO phenotype-gene associations for all the genes in a protein-protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results: In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein-protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term-gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions.",
author = "Raphael Petegrosso and Sunho Park and Hwang, {Tae Hyun} and Rui Kuang",
year = "2017",
month = "2",
day = "15",
doi = "10.1093/bioinformatics/btw649",
language = "English (US)",
volume = "33",
pages = "529--536",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - Transfer learning across ontologies for phenome-genome association prediction

AU - Petegrosso, Raphael

AU - Park, Sunho

AU - Hwang, Tae Hyun

AU - Kuang, Rui

PY - 2017/2/15

Y1 - 2017/2/15

N2 - Motivation: To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype-gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term-gene associations and HPO phenotype-gene associations for all the genes in a protein-protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results: In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein-protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term-gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions.

AB - Motivation: To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype-gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term-gene associations and HPO phenotype-gene associations for all the genes in a protein-protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results: In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein-protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term-gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions.

UR - http://www.scopus.com/inward/record.url?scp=85028360373&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028360373&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw649

DO - 10.1093/bioinformatics/btw649

M3 - Article

C2 - 27797759

AN - SCOPUS:85028360373

VL - 33

SP - 529

EP - 536

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 4

ER -