Human intestinal absorption (HIA) is an important roadblock in me formulation of new drug substances. In silico models for predicting me percentage of HIA based on calculated molecular descriptors are highly needed for the rapid estimation of this property. Here, we have studied the performance of a support vector machine (SVM) to classify compounds wim high or low fractional absorption (%FA > 30% or %FA < 30%). The analyzed data set consists of 578 structural diverse druglike molecules, which have been divided into a 480-molecule training set and a 98-molecule test set. Ten SVM classification models have been generated to investigate the impact of different individual molecular properties on %FA. Among these studied important molecule descriptors, topological polar surface area (TPSA) and predicted apparent octanol-water distribution coefficient at pH 6.5 (logoD 6.5) show better classification performance than the others. To obtain the best SVM classifier, the influences of different kernel functions and different combinations of molecular descriptors were investigated using a rigorous training-validation procedure. The best SVM classifier can give satisfactory predictions for the training set (97.8% for the poor-absorption class and 94.5% for the good-absorption class). Moreover, 100% of the poor-absorption class and 97.8% of the good-absorption class in the external test set could be correctly classified. Finally, the influence of the size of the training set and the unbalanced nature of the data set have been studied. The analysis demonstrates that large data set is necessary for the stability of the classification models. Furthermore, the weights for the poor-absorption class and the good-absorption class should be properly balanced to generate unbiased classification models. Our work illustrates mat SVMs used in combination with simple molecular descriptors can provide an extremely reliable assessment of intestinal absorption in an early in silico filtering process.
ASJC Scopus subject areas
- Chemical Engineering(all)
- Computer Science Applications
- Library and Information Sciences