One of the big advantages of de novo design is raising the bar in terms of numbers of molecules virtually accessible during a drug discovery project.
The recent advancements of deep learning techniques allowed the incorporation of the three aforementioned objectives of de novo design into a single multi- objectives optimization step.
2.De Novo Design:History and Background
GRID and HSITE/SpaceSkeleton were two of the first developed algorithms able to map cavities of proteins in order to identify potential ligand interaction points.
Following a similar approach but with an higher level of automation, LUDI became a very popular method and it was also included into commercially available modeling software.
Among the early ligand-based de novo design approaches, it is worthwhile to mention LeapFrog and SPROUT. SPROUT was probably one of the first approaches that included a specific algorithm named Computer Assisted Estimation of Synthetic Accessibility to address the syn- thetic feasibility of novel designed compounds.
Schneider and coworkers implemented a de novo design method named Design of Genuine Structures (DOGS). In this approach, molecules were generated using a set of about 25,000 available synthetic building blocks and 58 established reaction schemes.
3.Neural Network Architectures for De Novo Design
Among the several different autoencoder networks present in the literature, it is worth mentioning also Semi-Supervised VAE (SSVAE). In this model, the loss function is based not only on the ability of molecular reconstruction as in the VAE case, but it also considers the prediction accuracy of some molecular properties, such as molecular weight and solubility. In this way, the trained SSVAE network can generate novel compounds with the desired properties.
Due to its peculiar architecture, GAN presents some intrinsic limitations and weaknesses, such as unbalanced and instable training, and restricted chemical space learnt.
As molecular representations, SMILES, SELFIES, and graph can be directly employed with GANs. Very recently, also morphological profile images or gene expressions have been used in combination with GAN networks.
Other less common neural network architectures are Reinforced Adversarial Neural Computer (RANC) and its extension, Adversarial Threshold Neural Computer (ATNC).
Several different methods have been published to guide the optimization toward a desired chemical space . In this regard, it is worth mentioning Reinforcement Learning (RL), Transfer Learning (TL), Bayesian Optimization (BO),Conditional Gen- erative model (CGM), and Genetic Algorithm (GA).
In Table 1(p284), we list the currently available deep generative models applied to drug-like molecule generation.
4.Application of Ligand-Based Deep Generative Models to De Novo Drug Design
Almost all the deep generative models reported in Table 1 claimed a supremacy over other methods based on various metrics and/or significance of the designed molecules.
Recently Bush et al. presented three Turing-inspired tests to assess the performances of de novo molecular generators:
the first test was designed to check the ability of de novo design approaches to reproduce medicinal chemistry molecules.
the second test was to check if the generation of 1000 molecules by these techniques could be considered good by a team of medicinal chemists.
authors explored the ability of de novo design algorithm to generate legacy molecules of a drug discovery program starting from a single patented molecule.
The resulting best performing algorithm was BioDig, a more traditional cheminformatics approach based on matched molecular pair (MMP). This novel and valuable approach to assess de novo generative models demonstrated that MMPs are good in mimicking medicinal chemists mind.
As highlighted by many studies, the aim of each de novo design approach should be to propose novel and synthesizable molecules with relevant biological activity. In contrast to the more than hundred deep generative models found in the literature(Table 1), only six studies reported synthesis and testing of AI generated compounds. In Fig. 2, we schematize the de novo design workflows employed by these six case studies, together with the chemical structures of the best generated compounds.
The most famous example of experimentally validated compounds designed by a deep generative models is from 2019. Zhavoronkov et al. showed how a GENTRL network, a combination of VAE, RL, and tensor decomposition, could discover potent inhibitors of DDR1 kinase. Thanks to great results and for the short drug discovery timeline reported, this paper received a lot of attention by the general and social media. However, this work raised also some criticisms and debates in the scientific community. In particular, critiques were raised because of the close similarity of the discovered compound with other reported DDR1 inhibitors, such as the marketed drug Ponatinib, and because of the absence of some key kinases in the off-target panel kinases screened. In a recent publication, Zhavoronkov and Aspuru-Guzik replied to these critiques.
How to assess and compare the performances of different deep generative models is still an open question without a straightforward answer. Also for this reason, the controversy generated by the Zhavoronkov’s work induced Walters and Murcko to suggest some preliminary guidelines in order to establish standard criteria to judge AI generative models and the related designed compounds.
5.Pushing the Boundaries of Ligand-Based Deep Generative Approaches
In theory, all the aforementioned in silico approaches used for post-processing could be included in the scoring function of a deep generative approach. Moreover, one could also add several QSAR models to incorporate further properties, such as off-targets and DMPK information. This is the direction followed by Perron et al., from Iktos, in a very recent publication. In their results, one compound simultaneously met all the project objectives, showing also novel interesting functional groups for the project (see Fig. 2 for the compound structure).
Regarding the combination of generative approaches with specific reward functions in order to push the boundaries of AI generative models, it is worthwhile to also mention the work of Nigam et al…
6.Conclusion
In our study, we found that scoring and post filtering analysis of the generated drug-like molecules still represents a relevant step in the whole process.
While the choice of molecules to synthetize would always represent one of the main aspect of a drug discovery project, the fine-tuning of neural network reward functions could in theory accelerate this step.