Abstract
We present an end-to-end learning-based method for predicting possible human metabolites of small molecules including drugs. The metabolite prediction task is approached as a sequence translation problem with chemical compounds represented using the SMILES notation. We perform transfer leaning on a Seq2Seq Transformer model originally trained on chemical reaction data to predict the outcome of human metabolic reactions. We further build an ensemble model to account for multiple and diverse metabolites.
Extensive evaluation reveals that the proposed method generalizes well to different enzyme families, as it can correctly predict metabolites for phase I and phase II drug metabolism reactions as well as for other enzymes.