In this work, some chemometrics methods were applied for modelling and prediction of the acidic dissociation constants of some organic compounds with descriptors calculated from the molecular structure alone. The stepwise multiple linear regression method was used to select descriptors which are responsible for the pKa of these compounds. Then support vector machine (SVM), principal component regression (PCR), partial least squares (PLS) and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure-activity relationship models. The results obtained using SVM were compared with PLS, PCR and MLR, revealing that the SVM model was much better than other models. The root-mean-square errors of the training set and the test set for the SVM model are 0.2551 and 0.6139, and the correlation coefficients were 0.9936 and 0.9919, respectively. This paper provides a new and effective method for predicting pKa of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies. Finally, results have shown that the SVM drastically enhances the ability of prediction in QSAR studies superior to multiple linear regression, principal component regression and partial least squares.
- Principal component regression
- Quantitative structure-property relationship
- Support vector machines
ASJC Scopus subject areas
- Molecular Biology
- Condensed Matter Physics
- Physical and Theoretical Chemistry