Abstract
Most hits identified in the drug discovery pipelines and even 40% of marketed drugs suffer from suboptimal pharmacokinetic profiles. Co- crystallization, wherein a drug (or drug candidate) and another organic molecule form a multi- component crystal, can optimize physicochemical properties of those molecules without hampering their pharmacological activity. However, finding promising co-crystal pairs is resource-intensive due to the vast search space. Here we propose DeepCrystal, a deep learning model based on chemical language to predict co-crystallization. We rigorously validate DeepCrystal and find that it achieves 78% accuracy on realistic settings and displays superior performance to existing models. Leveraging the chemical language to represent molecules, DeepCrystal can estimate uncertainty in its predictions. We exploit this capability in a challenging prospective study and discover two novel co-crystal of diflunisal, an antiinflammatory drug. This prospective study exemplifies a successful application of deep learning to accelerate the co-crystallization process in the lab, highlighting its potential, in both academic and industrial settings.