TY - JOUR
T1 - Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction
AU - Ji, Beihong
AU - He, Xibing
AU - Zhai, Jingchen
AU - Zhang, Yuzhao
AU - Man, Viet Hoang
AU - Wang, Junmei
N1 - Funding Information:
This work was supported by the following funds from the National Science Foundation (NSF)
Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.
AB - Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.
KW - Binding affinity prediction
KW - Machine learning (ml)
KW - Machine learning-based scoring function (ml-based sf)
KW - Scoring function (sf)
KW - Scoring power
KW - Structure-based virtual screening (sbvs)
UR - http://www.scopus.com/inward/record.url?scp=85106454777&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106454777&partnerID=8YFLogxK
U2 - 10.1093/bib/bbab054
DO - 10.1093/bib/bbab054
M3 - Article
C2 - 33758923
AN - SCOPUS:85106454777
SN - 1467-5463
VL - 22
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 5
M1 - bbab054
ER -