Feature selection methods in QSAR studies

Mohammad Goodarzi, Bieke Dejaegher, Yvan Vander Heyden

Research output: Contribution to journalReview article

Abstract

A quantitative structure-activity relationship (QSAR) relates quantitative chemical structure attributes (molecular descriptors) to a biological activity. QSAR studies have now become attractive in drug discovery and development because their application can save substantial time and human resources. Several parameters are important in the prediction ability of a QSAR model. On the one hand, different statistical methods may be applied to check the linear or nonlinear behavior of a data set. On the other hand, feature selection techniques are applied to decrease the model complexity, to decrease the overfitting/overtraining risk, and to select the most important descriptors from the often more than 1000 calculated. The selected descriptors are then linked to a biological activity of the corresponding compound by means of a mathematical model. Different modeling techniques can be applied, some of which explicitly require a feature selection. A QSAR model can be useful in the design of new compounds with improved potency in the class under study. Only molecules with a predicted interesting activity will be synthesized. In the feature selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus attention, while ignoring the rest. Up to now, many feature selection techniques, such as genetic algorithms, forward selection, backward elimination, stepwise regression, and simulated annealing have been used extensively. Swarm intelligence optimizations, such as ant colony optimization and partial swarm optimization, which are feature selection techniques usually simulated based on animal and insect life behavior to find the shortest path between a food source and their nests, recently are also involved in QSAR studies. This review paper provides an overview of different feature selection techniques applied in QSAR modeling.

Original languageEnglish (US)
Pages (from-to)636-651
Number of pages16
JournalJournal of AOAC International
Volume95
Issue number3
DOIs
StatePublished - May 1 2012

ASJC Scopus subject areas

  • Analytical Chemistry
  • Food Science
  • Environmental Chemistry
  • Agronomy and Crop Science
  • Pharmacology

Fingerprint Dive into the research topics of 'Feature selection methods in QSAR studies'. Together they form a unique fingerprint.

Cite this