Aqueous solubility prediction based on weighted atom type counts and solvent accessible surface areas

Junmei Wang, Tingjun Hou, Xiaojie Xu

Research output: Contribution to journalArticle

43 Citations (Scopus)

Abstract

In this work, four reliable aqueous solubility models, ASM-ATC (aqueous solubility model based on atom type counts), ASM-ATC-LOGP (aqueous solubility model based on atom type counts and ClogP as an additional descriptor), ASM-SAS (aqueous solubility model based on solvent accessible surface areas), and ASM-SAS-LOGP (aqueous solubility model based on solvent accessible surface areas and ClogP as an additional descriptor), have been developed for a diverse data set of 3664 compounds. All four models were extensively validated by various cross-validation tests, and encouraging predictability was achieved. ASM-ATC-LOGP, the best model, achieves leave-one-out correlation coefficient square (q 2) and root-mean-square error (RMSE) of 0.832 and 0.840 logarithm unit, respectively. In a 10,000 times 85/15 cross-validation test, this model achieves the mean of q 2 and RMSE being 0.832 and 0.841 logarithm unit, respectively. We believe that those robust models can serve as an important rule in druglikeness analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).

Original languageEnglish (US)
Pages (from-to)571-581
Number of pages11
JournalJournal of Chemical Information and Modeling
Volume49
Issue number3
DOIs
StatePublished - Mar 23 2009

Fingerprint

Solubility
Atoms
Mean square error
Screening
Throughput

ASJC Scopus subject areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Aqueous solubility prediction based on weighted atom type counts and solvent accessible surface areas. / Wang, Junmei; Hou, Tingjun; Xu, Xiaojie.

In: Journal of Chemical Information and Modeling, Vol. 49, No. 3, 23.03.2009, p. 571-581.

Research output: Contribution to journalArticle

@article{edb924bd03954dd9ae113dae1a10d60d,
title = "Aqueous solubility prediction based on weighted atom type counts and solvent accessible surface areas",
abstract = "In this work, four reliable aqueous solubility models, ASM-ATC (aqueous solubility model based on atom type counts), ASM-ATC-LOGP (aqueous solubility model based on atom type counts and ClogP as an additional descriptor), ASM-SAS (aqueous solubility model based on solvent accessible surface areas), and ASM-SAS-LOGP (aqueous solubility model based on solvent accessible surface areas and ClogP as an additional descriptor), have been developed for a diverse data set of 3664 compounds. All four models were extensively validated by various cross-validation tests, and encouraging predictability was achieved. ASM-ATC-LOGP, the best model, achieves leave-one-out correlation coefficient square (q 2) and root-mean-square error (RMSE) of 0.832 and 0.840 logarithm unit, respectively. In a 10,000 times 85/15 cross-validation test, this model achieves the mean of q 2 and RMSE being 0.832 and 0.841 logarithm unit, respectively. We believe that those robust models can serve as an important rule in druglikeness analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).",
author = "Junmei Wang and Tingjun Hou and Xiaojie Xu",
year = "2009",
month = "3",
day = "23",
doi = "10.1021/ci800406y",
language = "English (US)",
volume = "49",
pages = "571--581",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "3",

}

TY - JOUR

T1 - Aqueous solubility prediction based on weighted atom type counts and solvent accessible surface areas

AU - Wang, Junmei

AU - Hou, Tingjun

AU - Xu, Xiaojie

PY - 2009/3/23

Y1 - 2009/3/23

N2 - In this work, four reliable aqueous solubility models, ASM-ATC (aqueous solubility model based on atom type counts), ASM-ATC-LOGP (aqueous solubility model based on atom type counts and ClogP as an additional descriptor), ASM-SAS (aqueous solubility model based on solvent accessible surface areas), and ASM-SAS-LOGP (aqueous solubility model based on solvent accessible surface areas and ClogP as an additional descriptor), have been developed for a diverse data set of 3664 compounds. All four models were extensively validated by various cross-validation tests, and encouraging predictability was achieved. ASM-ATC-LOGP, the best model, achieves leave-one-out correlation coefficient square (q 2) and root-mean-square error (RMSE) of 0.832 and 0.840 logarithm unit, respectively. In a 10,000 times 85/15 cross-validation test, this model achieves the mean of q 2 and RMSE being 0.832 and 0.841 logarithm unit, respectively. We believe that those robust models can serve as an important rule in druglikeness analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).

AB - In this work, four reliable aqueous solubility models, ASM-ATC (aqueous solubility model based on atom type counts), ASM-ATC-LOGP (aqueous solubility model based on atom type counts and ClogP as an additional descriptor), ASM-SAS (aqueous solubility model based on solvent accessible surface areas), and ASM-SAS-LOGP (aqueous solubility model based on solvent accessible surface areas and ClogP as an additional descriptor), have been developed for a diverse data set of 3664 compounds. All four models were extensively validated by various cross-validation tests, and encouraging predictability was achieved. ASM-ATC-LOGP, the best model, achieves leave-one-out correlation coefficient square (q 2) and root-mean-square error (RMSE) of 0.832 and 0.840 logarithm unit, respectively. In a 10,000 times 85/15 cross-validation test, this model achieves the mean of q 2 and RMSE being 0.832 and 0.841 logarithm unit, respectively. We believe that those robust models can serve as an important rule in druglikeness analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).

UR - http://www.scopus.com/inward/record.url?scp=65249097797&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65249097797&partnerID=8YFLogxK

U2 - 10.1021/ci800406y

DO - 10.1021/ci800406y

M3 - Article

C2 - 19226181

AN - SCOPUS:65249097797

VL - 49

SP - 571

EP - 581

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 3

ER -