### Abstract

In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pK_{a} of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.

Original language | English (US) |
---|---|

Pages (from-to) | 1495-1503 |

Number of pages | 9 |

Journal | Molecular Physics |

Volume | 107 |

Issue number | 14 |

DOIs | |

State | Published - Jan 1 2009 |

### Fingerprint

### Keywords

- MLR
- PLS
- Principal component regression
- Quantitative structure-property relationship
- Support vector machines

### ASJC Scopus subject areas

- Biophysics
- Molecular Biology
- Condensed Matter Physics
- Physical and Theoretical Chemistry

### Cite this

**Prediction of the acidic dissociation constant (pK _{a}) of some organic compounds using linear and nonlinear QSPR methods.** / Goudarzi, Nasser; Goodarzi, Mohammad.

Research output: Contribution to journal › Article

_{a}) of some organic compounds using linear and nonlinear QSPR methods',

*Molecular Physics*, vol. 107, no. 14, pp. 1495-1503. https://doi.org/10.1080/00268970902950394

}

TY - JOUR

T1 - Prediction of the acidic dissociation constant (pKa) of some organic compounds using linear and nonlinear QSPR methods

AU - Goudarzi, Nasser

AU - Goodarzi, Mohammad

PY - 2009/1/1

Y1 - 2009/1/1

N2 - In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pKa of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.

AB - In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pKa of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.

KW - MLR

KW - PLS

KW - Principal component regression

KW - Quantitative structure-property relationship

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=67749133758&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67749133758&partnerID=8YFLogxK

U2 - 10.1080/00268970902950394

DO - 10.1080/00268970902950394

M3 - Article

AN - SCOPUS:67749133758

VL - 107

SP - 1495

EP - 1503

JO - Molecular Physics

JF - Molecular Physics

SN - 0026-8976

IS - 14

ER -