Chapter8 : Predicting Residence Time of GPCR Ligands with Machine Learning

reading notes of《Artificial Intelligence in Drug Design》

1.Introduction

Drug-target residence time, the inverse of the rate of ligand dissociation has been observed, for some targets, to be more influential than equilibrium binding affinity in conferring efficacy.
The evidence for association of residence time and efficacy is most abundant for GPCRs. The efficacy of muscarinic acetylcholine M3 receptor agonists was found to correlate solely with their associated residence times and not with their binding affinities.
It should be noted that extending residence time not only impacts efficacy, it may also have an effect on drug-dose intervals.
The difference in residence time of the ligand for its target compared with off-target proteins determines the probability of having off-target side effects.
Machine learning (ML) has been at forefront in drug discovery for many years, partly due to the fact it can make regressions in complex data that humans cannot understand, and is a good candidate to con- sider when searching for a methodology for the computational prediction of drug-target residence time.

In order to extend residence time without affecting equilibrium binding affinity, the stability of the transition binding state needs to be achieved.
A survey of Pfizer’s database of 2000 compounds with residence time values reveals a weak correlation between extended residence time and ligand size.
In other instances, there is a strong correlation between residence time and ligand molecular weight.
The flexibility of the binding site has been noted to also affect residence time.
Matched molecular pair (MMP) analysis was recently used to try to understand structural–kinetic relationships. Only a few MMP transformations were identified where the residence time was significantly extended, whereas the binding affinity and the association rate remained largely unperturbed.

请添加图片描述

The ability to deploy any of these ML models is very much dependent on the specific protein system under study as one can only predict residence time of a compound for a protein target when there is enough kinetic binding data for that protein with which to train a ML model.

This section details instructions of the Python libraries that need to be installed to perform the ML methods detailed here.
- To install PyQSAR, one needs to create a Python 2.7 environment (see Note 1).
- For ligand feature generation, Mordred will be used. This must be installed in the Python 2.7 environment.
- In a separate Python 3.7 environment, install the following packages: matplotlib, RDKit, pandas, and scikit-learn.

One can obtain ligand kinetic data from ChEMBL by searching by activity type, however,the data is much sparser than for binding affinity endpoints and is often far from a comprehensive source of all of the kinetic data available in the literature.
A recently published data- base, KOFFI-DB , contains kinetic parameters of ligand binding derived from surface plasmon resonance. Currently approximately 1000 k_off values are available in this database.
By far, the largest collection of kinetic data is the KIND (KINetic Dataset) which contains 3812 entries collated from 21 publications and data produced by the EU-IMI consortium K4DD for a wide range of targets including ion channels, kinases and GPCRs.
The paucity of kinetic data is one of the main challenges of using ML to predict residence time.
In the examples shown in this chapter, GPCR data scraped from publications and manually curated will be used. These 536 entries are approximately double the number of entries found in the KIND for GPCRs and can be downloaded from here.
- All residence time values were corrected to the mean temperature used across all experiments (294.15 K) using Arrhenius’ equation : $k_{off}=Ae^{-\frac{E_A}{RT}}$
- It is important to make sure that the training data are relatively representative of the test compounds. For this reason, the peptide entries for the NK1 receptor were removed from the example GPCR training set.

The first method, the ML model trained on ligand features alone, is a QSKR multi-linear regression model for a single target. An open- source Python library, PyQSAR, will be used to carry out the QSKR modeling.

Multi-target models require some information of the protein either through explicit representation of the protein the ligand is targeting (e.g., by inputting the protein sequence) or through interactions made to the protein by the ligand. These models move toward more generalized models for predicting ligand kinetic rates as opposed to models only applicable to single receptors or to a single ligand series for a single target.
To extend the COMBINE workflow mentioned in the Introduction to a multi-target model, protein family numbering schemes can be used to find equivalent residues to feed into the model as training data. For example, for GPCRs, one could use the GPCRdb’s modified Ballesteros and Weinstein numbering Scheme to find ligand energy values (VdWs and electrostatics) to specific residue positions (see Fig. 3).
Due to the lack of ligand kinetic data, the overlap between ligand kinetic data and associated PDB structures is very small. Therefore, one has to use predicted structures of the protein–ligand structures to increase the data. To increase the reliability of these predicted poses (dockings), one can use ensembles of short molecular dynamics (MD) simulations fol- lowing docking.

相关阅读:
化妆品斑贴测试HRIPT/RIPT
一位同学的Python大作业【分析当当网书籍价格、出版社、电子书版本占比数据】
一台服务器，最大支持的TCP连接数是多少？
C++ STL教程
Eclipse下载、安装、配置教程
字节跳动面经——实习算法岗
【数据结构】双向链表（C语言）
Git命令
设置TOMCAT SESSIONID 字符长度和生成算法
基于requests框架实现接口自动化测试项目实战

原文地址：https://blog.csdn.net/weixin_52812620/article/details/126215825