Abstract
Compounds with defined multi-target activity are candidates for the treatment of multi-factorial diseases. Such compounds are mostly discovered experimentally, for example, by multi-stage screening campaigns or target profiling. However, multi-target compounds are also topical in drug design. In medicinal chemistry, design of compounds with desired activity against two targets is typically attempted by pharmacophore fusion. In addition, machine learning models can be constructed for multi-task (target) prediction or virtually screening of compound libraries against arrays of single-target classifiers (computational target profiling). Furthermore, multi-target compounds can also be predicted by deep generative modeling. However, compared to pharmacophore approaches and classification models, generative design of multi-target compounds is still in its very early stages. Herein, we introduce transformer-based chemical language model variants for the design of dual-target compounds. Alternative models were pre-trained by learning mappings of single- to dual-target compounds of increasing similarity. Then, different models were optimized for generating compounds with defined activity against pairs of functionally unrelated targets using a new technique termed cross fine-tuning, applying compound similarity constraints corresponding to pre-training. Control models were devised to confirm that pre-trained and fine-tuned models indeed charted chemical space of dual-target compounds. As a stringent criterion for predictive performance, the final models were found to exactly reproduce known dual-target compounds excluded from model derivation. In addition, many structural analogues of such compounds were generated, thus lending credence to the design approach.