Abstract
Scaffold hopping, aiming to identify molecules with novel scaffolds but share a similar target biological activity toward known hit molecules, has always been a topic of interest in rational drug design. Computer-aided scaffold hopping would be a valuable tool but at present it suffers from limited search space and incomplete expert-defined rules and thus provides results of unsatisfactory quality. To addree the issue, we describe a fully data-driven model that learns to perform target-centric scaffold hopping tasks. Our deep multi-modal model, DeepHop, accepts a hit molecule and an interest target protein sequence as inputs and design bioisosteric molecular structures to the target compound. The model was trained on 50K experimental scaffold hopping pairs curated from the public bioactivity database, which spans 40 kinases commonly investigated by medicinal chemists. Extensive experiments demonstrated that DeepHop could design more than 70% molecules with improved bioactivity, high 3D similarity, while low 2D scaffold similarity to the template molecules. Our method achieves 2.2 times larger efficiency than state-of-the-art deep learning methods and 4.7 times than rule-based methods. Case studies have also shown the advantages and usefulness of DeepHop in practical scaffold hopping scenario.