Prediction of the acidic dissociation constant (pKa) of some organic compounds using linear and nonlinear QSPR methods

Nasser Goudarzi, Mohammad Goodarzi

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pKa of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.

Original languageEnglish (US)
Pages (from-to)1495-1503
Number of pages9
JournalMolecular Physics
Volume107
Issue number14
DOIs
StatePublished - Jan 1 2009

Fingerprint

organic compounds
Organic compounds
Support vector machines
regression analysis
dissociation
Linear regression
Linear Models
predictions
Least-Squares Analysis
Quantitative Structure-Activity Relationship
Molecular Structure
root-mean-square errors
Mean square error
Molecular structure
Support Vector Machine
correlation coefficients
education
molecular structure

Keywords

  • MLR
  • PLS
  • Principal component regression
  • Quantitative structure-property relationship
  • Support vector machines

ASJC Scopus subject areas

  • Biophysics
  • Molecular Biology
  • Condensed Matter Physics
  • Physical and Theoretical Chemistry

Cite this

Prediction of the acidic dissociation constant (pKa) of some organic compounds using linear and nonlinear QSPR methods. / Goudarzi, Nasser; Goodarzi, Mohammad.

In: Molecular Physics, Vol. 107, No. 14, 01.01.2009, p. 1495-1503.

Research output: Contribution to journalArticle

@article{15ac9316d206497aab88a5ac203e0ef3,
title = "Prediction of the acidic dissociation constant (pKa) of some organic compounds using linear and nonlinear QSPR methods",
abstract = "In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pKa of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.",
keywords = "MLR, PLS, Principal component regression, Quantitative structure-property relationship, Support vector machines",
author = "Nasser Goudarzi and Mohammad Goodarzi",
year = "2009",
month = "1",
day = "1",
doi = "10.1080/00268970902950394",
language = "English (US)",
volume = "107",
pages = "1495--1503",
journal = "Molecular Physics",
issn = "0026-8976",
publisher = "Taylor and Francis Ltd.",
number = "14",

}

TY - JOUR

T1 - Prediction of the acidic dissociation constant (pKa) of some organic compounds using linear and nonlinear QSPR methods

AU - Goudarzi, Nasser

AU - Goodarzi, Mohammad

PY - 2009/1/1

Y1 - 2009/1/1

N2 - In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pKa of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.

AB - In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pKa of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.

KW - MLR

KW - PLS

KW - Principal component regression

KW - Quantitative structure-property relationship

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=67749133758&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67749133758&partnerID=8YFLogxK

U2 - 10.1080/00268970902950394

DO - 10.1080/00268970902950394

M3 - Article

AN - SCOPUS:67749133758

VL - 107

SP - 1495

EP - 1503

JO - Molecular Physics

JF - Molecular Physics

SN - 0026-8976

IS - 14

ER -