Development of reliable aqueous solubility models and their application in druglike analysis

Junmei Wang; George Krudy; Tingjun Hou; Wei Zhang; George Holland; Xiaojie Xu

doi:10.1021/ci700096r

Development of reliable aqueous solubility models and their application in druglike analysis

Junmei Wang, George Krudy, Tingjun Hou, Wei Zhang, George Holland, Xiaojie Xu

Research output: Contribution to journal › Article › peer-review

110 Scopus citations

Abstract

In this work, two reliable aqueous solubility models, ASMS (aqueous solubility based on molecular surface) and ASMS-LOGP (aqueous solubility based on molecular surface using ClogP as a descriptor), were constructed by using atom type classified solvent accessible surface areas and several molecular descriptors for a diverse data set of 1708 molecules. For ASMS (without using ClogP as a descriptor), the leave-one-out q ² and root-mean-square error (RMSE) were 0.872 and 0.748 log unit, respectively. ASMS-LOGP was slightly better than ASMS (q ² = 0.886, RMSE = 0.705). Both models were extensively validated by three cross-validation tests and encouraging predictability was achieved. High throughput aqueous solubility prediction was conducted for a number of data sets extracted from several widely used databases. We found that real drugs are about 20-fold more soluble than the so-called druglike molecules in the ZINC database, which have no violation of Lipinski's "Rule of 5" at all. Specifically, oral drugs are about 16-fold more soluble, while injection drugs are 50-60-fold more soluble. If the criterion of a molecule to be soluble is set to -5 log unit, about 85% of real drugs are predicted as soluble; in contrast only 50% of druglike molecules in ZINC are soluble. We concluded that the two models could be served as a rule in druglike analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).

Original language	English (US)
Pages (from-to)	1395-1404
Number of pages	10
Journal	Journal of Chemical Information and Modeling
Volume	47
Issue number	4
DOIs	https://doi.org/10.1021/ci700096r
State	Published - Jul 2007

ASJC Scopus subject areas

General Chemistry
General Chemical Engineering
Computer Science Applications
Library and Information Sciences

Access to Document

10.1021/ci700096r

Cite this

@article{c336d19736124cb38c03a27bf6e1c8a2,

title = "Development of reliable aqueous solubility models and their application in druglike analysis",

abstract = "In this work, two reliable aqueous solubility models, ASMS (aqueous solubility based on molecular surface) and ASMS-LOGP (aqueous solubility based on molecular surface using ClogP as a descriptor), were constructed by using atom type classified solvent accessible surface areas and several molecular descriptors for a diverse data set of 1708 molecules. For ASMS (without using ClogP as a descriptor), the leave-one-out q 2 and root-mean-square error (RMSE) were 0.872 and 0.748 log unit, respectively. ASMS-LOGP was slightly better than ASMS (q 2 = 0.886, RMSE = 0.705). Both models were extensively validated by three cross-validation tests and encouraging predictability was achieved. High throughput aqueous solubility prediction was conducted for a number of data sets extracted from several widely used databases. We found that real drugs are about 20-fold more soluble than the so-called druglike molecules in the ZINC database, which have no violation of Lipinski's {"}Rule of 5{"} at all. Specifically, oral drugs are about 16-fold more soluble, while injection drugs are 50-60-fold more soluble. If the criterion of a molecule to be soluble is set to -5 log unit, about 85% of real drugs are predicted as soluble; in contrast only 50% of druglike molecules in ZINC are soluble. We concluded that the two models could be served as a rule in druglike analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).",

author = "Junmei Wang and George Krudy and Tingjun Hou and Wei Zhang and George Holland and Xiaojie Xu",

year = "2007",

month = jul,

doi = "10.1021/ci700096r",

language = "English (US)",

volume = "47",

pages = "1395--1404",

journal = "Journal of Chemical Information and Modeling",

issn = "1549-9596",

publisher = "American Chemical Society",

number = "4",

}

TY - JOUR

T1 - Development of reliable aqueous solubility models and their application in druglike analysis

AU - Wang, Junmei

AU - Krudy, George

AU - Hou, Tingjun

AU - Zhang, Wei

AU - Holland, George

AU - Xu, Xiaojie

PY - 2007/7

Y1 - 2007/7

N2 - In this work, two reliable aqueous solubility models, ASMS (aqueous solubility based on molecular surface) and ASMS-LOGP (aqueous solubility based on molecular surface using ClogP as a descriptor), were constructed by using atom type classified solvent accessible surface areas and several molecular descriptors for a diverse data set of 1708 molecules. For ASMS (without using ClogP as a descriptor), the leave-one-out q 2 and root-mean-square error (RMSE) were 0.872 and 0.748 log unit, respectively. ASMS-LOGP was slightly better than ASMS (q 2 = 0.886, RMSE = 0.705). Both models were extensively validated by three cross-validation tests and encouraging predictability was achieved. High throughput aqueous solubility prediction was conducted for a number of data sets extracted from several widely used databases. We found that real drugs are about 20-fold more soluble than the so-called druglike molecules in the ZINC database, which have no violation of Lipinski's "Rule of 5" at all. Specifically, oral drugs are about 16-fold more soluble, while injection drugs are 50-60-fold more soluble. If the criterion of a molecule to be soluble is set to -5 log unit, about 85% of real drugs are predicted as soluble; in contrast only 50% of druglike molecules in ZINC are soluble. We concluded that the two models could be served as a rule in druglike analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).

AB - In this work, two reliable aqueous solubility models, ASMS (aqueous solubility based on molecular surface) and ASMS-LOGP (aqueous solubility based on molecular surface using ClogP as a descriptor), were constructed by using atom type classified solvent accessible surface areas and several molecular descriptors for a diverse data set of 1708 molecules. For ASMS (without using ClogP as a descriptor), the leave-one-out q 2 and root-mean-square error (RMSE) were 0.872 and 0.748 log unit, respectively. ASMS-LOGP was slightly better than ASMS (q 2 = 0.886, RMSE = 0.705). Both models were extensively validated by three cross-validation tests and encouraging predictability was achieved. High throughput aqueous solubility prediction was conducted for a number of data sets extracted from several widely used databases. We found that real drugs are about 20-fold more soluble than the so-called druglike molecules in the ZINC database, which have no violation of Lipinski's "Rule of 5" at all. Specifically, oral drugs are about 16-fold more soluble, while injection drugs are 50-60-fold more soluble. If the criterion of a molecule to be soluble is set to -5 log unit, about 85% of real drugs are predicted as soluble; in contrast only 50% of druglike molecules in ZINC are soluble. We concluded that the two models could be served as a rule in druglike analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).

UR - http://www.scopus.com/inward/record.url?scp=34547702167&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547702167&partnerID=8YFLogxK

U2 - 10.1021/ci700096r

DO - 10.1021/ci700096r

M3 - Article

C2 - 17569522

AN - SCOPUS:34547702167

SN - 1549-9596

VL - 47

SP - 1395

EP - 1404

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

IS - 4

ER -

Development of reliable aqueous solubility models and their application in druglike analysis

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this