Abstract
Generating drug candidates with desired protein‒ligand interactions is a significant challenge in structure-based drug design. In this study, a new generative model, IEV2Mol, is proposed that incorporates interaction energy vectors (IEVs) between proteins and ligands obtained from docking simulations, which quantitatively capture the strength of each interaction type, such as hydrogen bonds, electrostatic interactions, and van der Waals forces. By integrating this IEV into an end-to-end variational autoencoder (VAE) framework that learns the chemical space from SMILES and minimizes the reconstruction error of the SMILES, the model can more accurately generate compounds with the desired interactions. To evaluate the effectiveness of IEV2Mol, we performed benchmark comparisons with randomly selected compounds, unconstrained VAE models (JT-VAE), and compounds generated by RNN models based on interaction fingerprints (IFP-RNN). The results show that the compounds generated by IEV2Mol retain a significantly greater percentage of the binding mode of the query structure than those of the other methods. Furthermore, IEV2Mol was able to generate compounds with interactions similar to those of the input compounds, regardless of structural similarity. The source code and trained models for IEV2Mol, JT-VAE, and IFP-RNN designed for generating compounds active against the DRD2 receptor, as well as the datasets (DM-QP-1M, DRD2 Active, and ChEMBL33) utilized in this study, are released under the MIT License and available at https://github.com/sekijima-lab/IEV2Mol.