Driver missense mutation identification using feature selection and model fusion

Ahmed T. Soliman; Tao Meng; Shu Ching Chen; S. S. Iyengar; Puneeth Iyengar; John Yordy; Mei Ling Shyu

doi:10.1089/cmb.2015.0110

Driver missense mutation identification using feature selection and model fusion

Ahmed T. Soliman, Tao Meng, Shu Ching Chen, S. S. Iyengar, Puneeth Iyengar, John Yordy, Mei Ling Shyu

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Driver mutations propel oncogenesis and occur much less frequently than passenger mutations. The need for automatic and accurate identification of driver mutations has increased dramatically with the exponential growth of mutation data. Current computational solutions to identify driver mutations rely on sequence homology. Here we construct a machine learning-based framework that does not rely on sequence homology or domain knowledge to predict driver missense mutations. A windowing approach to represent the local environment of the sequence around the mutation point as a mutation sample is applied, followed by extraction of three sequence-level features from each sample. After selecting the most significant features, the support vector machine and multimodal fusion strategies are employed to give final predictions. The proposed framework achieves relatively high performance and outperforms current state-of-the-art algorithms. The ease of deploying the proposed framework and the relatively accurate performance make this solution applicable to large-scale mutation data analyses.

Original language	English (US)
Pages (from-to)	1075-1085
Number of pages	11
Journal	Journal of Computational Biology
Volume	22
Issue number	12
DOIs	https://doi.org/10.1089/cmb.2015.0110
State	Published - Dec 2015

Keywords

Cancer genome
Driver mutation
Passenger mutation

ASJC Scopus subject areas

Computational Mathematics
Genetics
Molecular Biology
Computational Theory and Mathematics
Modeling and Simulation

Access to Document

10.1089/cmb.2015.0110

Cite this

@article{72610fe072444b53b05a880a2492cc88,

title = "Driver missense mutation identification using feature selection and model fusion",

abstract = "Driver mutations propel oncogenesis and occur much less frequently than passenger mutations. The need for automatic and accurate identification of driver mutations has increased dramatically with the exponential growth of mutation data. Current computational solutions to identify driver mutations rely on sequence homology. Here we construct a machine learning-based framework that does not rely on sequence homology or domain knowledge to predict driver missense mutations. A windowing approach to represent the local environment of the sequence around the mutation point as a mutation sample is applied, followed by extraction of three sequence-level features from each sample. After selecting the most significant features, the support vector machine and multimodal fusion strategies are employed to give final predictions. The proposed framework achieves relatively high performance and outperforms current state-of-the-art algorithms. The ease of deploying the proposed framework and the relatively accurate performance make this solution applicable to large-scale mutation data analyses.",

keywords = "Cancer genome, Driver mutation, Passenger mutation",

author = "Soliman, {Ahmed T.} and Tao Meng and Chen, {Shu Ching} and Iyengar, {S. S.} and Puneeth Iyengar and John Yordy and Shyu, {Mei Ling}",

year = "2015",

month = dec,

doi = "10.1089/cmb.2015.0110",

language = "English (US)",

volume = "22",

pages = "1075--1085",

journal = "Journal of Computational Biology",

issn = "1066-5277",

publisher = "Mary Ann Liebert Inc.",

number = "12",

}

TY - JOUR

T1 - Driver missense mutation identification using feature selection and model fusion

AU - Soliman, Ahmed T.

AU - Meng, Tao

AU - Chen, Shu Ching

AU - Iyengar, S. S.

AU - Iyengar, Puneeth

AU - Yordy, John

AU - Shyu, Mei Ling

PY - 2015/12

Y1 - 2015/12

N2 - Driver mutations propel oncogenesis and occur much less frequently than passenger mutations. The need for automatic and accurate identification of driver mutations has increased dramatically with the exponential growth of mutation data. Current computational solutions to identify driver mutations rely on sequence homology. Here we construct a machine learning-based framework that does not rely on sequence homology or domain knowledge to predict driver missense mutations. A windowing approach to represent the local environment of the sequence around the mutation point as a mutation sample is applied, followed by extraction of three sequence-level features from each sample. After selecting the most significant features, the support vector machine and multimodal fusion strategies are employed to give final predictions. The proposed framework achieves relatively high performance and outperforms current state-of-the-art algorithms. The ease of deploying the proposed framework and the relatively accurate performance make this solution applicable to large-scale mutation data analyses.

AB - Driver mutations propel oncogenesis and occur much less frequently than passenger mutations. The need for automatic and accurate identification of driver mutations has increased dramatically with the exponential growth of mutation data. Current computational solutions to identify driver mutations rely on sequence homology. Here we construct a machine learning-based framework that does not rely on sequence homology or domain knowledge to predict driver missense mutations. A windowing approach to represent the local environment of the sequence around the mutation point as a mutation sample is applied, followed by extraction of three sequence-level features from each sample. After selecting the most significant features, the support vector machine and multimodal fusion strategies are employed to give final predictions. The proposed framework achieves relatively high performance and outperforms current state-of-the-art algorithms. The ease of deploying the proposed framework and the relatively accurate performance make this solution applicable to large-scale mutation data analyses.

KW - Cancer genome

KW - Driver mutation

KW - Passenger mutation

UR - http://www.scopus.com/inward/record.url?scp=84948773146&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948773146&partnerID=8YFLogxK

U2 - 10.1089/cmb.2015.0110

DO - 10.1089/cmb.2015.0110

M3 - Article

C2 - 26402258

AN - SCOPUS:84948773146

SN - 1066-5277

VL - 22

SP - 1075

EP - 1085

JO - Journal of Computational Biology

JF - Journal of Computational Biology

IS - 12

ER -

Driver missense mutation identification using feature selection and model fusion

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this