Feature selection method based on fuzzy entropy for regression in QSAR studies

Zahra Elmi, Karim Faez, Mohammad Goodarzi, Nasser Goudarzi

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Feature selection and feature extraction are the most important steps in classification and regression systems. Feature selection is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which would be impossible to process further. Recent example includes quantitative structure-activity relationships (QSAR) dataset including 1226 features. A major problem of QSAR is the high dimensionality of the feature space; therefore, feature selection is the most important step in this study. This paper presents a novel feature selection algorithm that is based on entropy. The performance of the proposed algorithm is compared with that of a genetic algorithm method and a stepwise regression method. The root mean square error of prediction in a QSAR study using entropy, genetic algorithm and stepwise regression using multiple linear regressions model for training set and test set were 0.3433, 0.3591 and 0.5500, 0.4326 and 0.6373, 0.6672, respectively.

Original languageEnglish (US)
Pages (from-to)1787-1798
Number of pages12
JournalMolecular Physics
Volume107
Issue number17
DOIs
StatePublished - Oct 1 2009

Fingerprint

Quantitative Structure-Activity Relationship
Entropy
Feature extraction
regression analysis
entropy
genetic algorithms
Linear Models
Genetic algorithms
root-mean-square errors
pattern recognition
education
Linear regression
Mean square error
predictions
Datasets

Keywords

  • Feature selection
  • Fuzzy entropy
  • Genetic algorithm
  • Multiple linear regressions
  • Quantitative structure-activity relationships
  • Regression

ASJC Scopus subject areas

  • Biophysics
  • Molecular Biology
  • Condensed Matter Physics
  • Physical and Theoretical Chemistry

Cite this

Feature selection method based on fuzzy entropy for regression in QSAR studies. / Elmi, Zahra; Faez, Karim; Goodarzi, Mohammad; Goudarzi, Nasser.

In: Molecular Physics, Vol. 107, No. 17, 01.10.2009, p. 1787-1798.

Research output: Contribution to journalArticle

Elmi, Zahra ; Faez, Karim ; Goodarzi, Mohammad ; Goudarzi, Nasser. / Feature selection method based on fuzzy entropy for regression in QSAR studies. In: Molecular Physics. 2009 ; Vol. 107, No. 17. pp. 1787-1798.
@article{83ebf0e3707940de9703fe8f9c247d24,
title = "Feature selection method based on fuzzy entropy for regression in QSAR studies",
abstract = "Feature selection and feature extraction are the most important steps in classification and regression systems. Feature selection is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which would be impossible to process further. Recent example includes quantitative structure-activity relationships (QSAR) dataset including 1226 features. A major problem of QSAR is the high dimensionality of the feature space; therefore, feature selection is the most important step in this study. This paper presents a novel feature selection algorithm that is based on entropy. The performance of the proposed algorithm is compared with that of a genetic algorithm method and a stepwise regression method. The root mean square error of prediction in a QSAR study using entropy, genetic algorithm and stepwise regression using multiple linear regressions model for training set and test set were 0.3433, 0.3591 and 0.5500, 0.4326 and 0.6373, 0.6672, respectively.",
keywords = "Feature selection, Fuzzy entropy, Genetic algorithm, Multiple linear regressions, Quantitative structure-activity relationships, Regression",
author = "Zahra Elmi and Karim Faez and Mohammad Goodarzi and Nasser Goudarzi",
year = "2009",
month = "10",
day = "1",
doi = "10.1080/00268970903078559",
language = "English (US)",
volume = "107",
pages = "1787--1798",
journal = "Molecular Physics",
issn = "0026-8976",
publisher = "Taylor and Francis Ltd.",
number = "17",

}

TY - JOUR

T1 - Feature selection method based on fuzzy entropy for regression in QSAR studies

AU - Elmi, Zahra

AU - Faez, Karim

AU - Goodarzi, Mohammad

AU - Goudarzi, Nasser

PY - 2009/10/1

Y1 - 2009/10/1

N2 - Feature selection and feature extraction are the most important steps in classification and regression systems. Feature selection is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which would be impossible to process further. Recent example includes quantitative structure-activity relationships (QSAR) dataset including 1226 features. A major problem of QSAR is the high dimensionality of the feature space; therefore, feature selection is the most important step in this study. This paper presents a novel feature selection algorithm that is based on entropy. The performance of the proposed algorithm is compared with that of a genetic algorithm method and a stepwise regression method. The root mean square error of prediction in a QSAR study using entropy, genetic algorithm and stepwise regression using multiple linear regressions model for training set and test set were 0.3433, 0.3591 and 0.5500, 0.4326 and 0.6373, 0.6672, respectively.

AB - Feature selection and feature extraction are the most important steps in classification and regression systems. Feature selection is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which would be impossible to process further. Recent example includes quantitative structure-activity relationships (QSAR) dataset including 1226 features. A major problem of QSAR is the high dimensionality of the feature space; therefore, feature selection is the most important step in this study. This paper presents a novel feature selection algorithm that is based on entropy. The performance of the proposed algorithm is compared with that of a genetic algorithm method and a stepwise regression method. The root mean square error of prediction in a QSAR study using entropy, genetic algorithm and stepwise regression using multiple linear regressions model for training set and test set were 0.3433, 0.3591 and 0.5500, 0.4326 and 0.6373, 0.6672, respectively.

KW - Feature selection

KW - Fuzzy entropy

KW - Genetic algorithm

KW - Multiple linear regressions

KW - Quantitative structure-activity relationships

KW - Regression

UR - http://www.scopus.com/inward/record.url?scp=68949086313&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=68949086313&partnerID=8YFLogxK

U2 - 10.1080/00268970903078559

DO - 10.1080/00268970903078559

M3 - Article

AN - SCOPUS:68949086313

VL - 107

SP - 1787

EP - 1798

JO - Molecular Physics

JF - Molecular Physics

SN - 0026-8976

IS - 17

ER -