TY - JOUR
T1 - Development of reliable aqueous solubility models and their application in druglike analysis
AU - Wang, Junmei
AU - Krudy, George
AU - Hou, Tingjun
AU - Zhang, Wei
AU - Holland, George
AU - Xu, Xiaojie
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2007/7
Y1 - 2007/7
N2 - In this work, two reliable aqueous solubility models, ASMS (aqueous solubility based on molecular surface) and ASMS-LOGP (aqueous solubility based on molecular surface using ClogP as a descriptor), were constructed by using atom type classified solvent accessible surface areas and several molecular descriptors for a diverse data set of 1708 molecules. For ASMS (without using ClogP as a descriptor), the leave-one-out q 2 and root-mean-square error (RMSE) were 0.872 and 0.748 log unit, respectively. ASMS-LOGP was slightly better than ASMS (q 2 = 0.886, RMSE = 0.705). Both models were extensively validated by three cross-validation tests and encouraging predictability was achieved. High throughput aqueous solubility prediction was conducted for a number of data sets extracted from several widely used databases. We found that real drugs are about 20-fold more soluble than the so-called druglike molecules in the ZINC database, which have no violation of Lipinski's "Rule of 5" at all. Specifically, oral drugs are about 16-fold more soluble, while injection drugs are 50-60-fold more soluble. If the criterion of a molecule to be soluble is set to -5 log unit, about 85% of real drugs are predicted as soluble; in contrast only 50% of druglike molecules in ZINC are soluble. We concluded that the two models could be served as a rule in druglike analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).
AB - In this work, two reliable aqueous solubility models, ASMS (aqueous solubility based on molecular surface) and ASMS-LOGP (aqueous solubility based on molecular surface using ClogP as a descriptor), were constructed by using atom type classified solvent accessible surface areas and several molecular descriptors for a diverse data set of 1708 molecules. For ASMS (without using ClogP as a descriptor), the leave-one-out q 2 and root-mean-square error (RMSE) were 0.872 and 0.748 log unit, respectively. ASMS-LOGP was slightly better than ASMS (q 2 = 0.886, RMSE = 0.705). Both models were extensively validated by three cross-validation tests and encouraging predictability was achieved. High throughput aqueous solubility prediction was conducted for a number of data sets extracted from several widely used databases. We found that real drugs are about 20-fold more soluble than the so-called druglike molecules in the ZINC database, which have no violation of Lipinski's "Rule of 5" at all. Specifically, oral drugs are about 16-fold more soluble, while injection drugs are 50-60-fold more soluble. If the criterion of a molecule to be soluble is set to -5 log unit, about 85% of real drugs are predicted as soluble; in contrast only 50% of druglike molecules in ZINC are soluble. We concluded that the two models could be served as a rule in druglike analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).
UR - http://www.scopus.com/inward/record.url?scp=34547702167&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547702167&partnerID=8YFLogxK
U2 - 10.1021/ci700096r
DO - 10.1021/ci700096r
M3 - Article
C2 - 17569522
AN - SCOPUS:34547702167
SN - 1549-9596
VL - 47
SP - 1395
EP - 1404
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 4
ER -