Abstract
Transformer models coupled with Simplified Molecular Line Entry System (SMILES) have
recently proven to be a powerful combination for solving challenges in cheminformatics. These
models, however, are often developed specifically for a single application and can be very
resource-intensive to train. In this work we present Chemformer model – a Transformerbased
model which can be quickly applied to both sequence-to-sequence and discriminative
cheminformatics tasks. Additionally, we show that self-supervised pre-training can improve
performance and significantly speed up convergence on downstream tasks. On direct synthesis
and retrosynthesis prediction benchmark datasets we publish state-of-the-art results for top-
1 accuracy. We also improve on existing approaches for a molecular optimisation task and
show that Chemformer can optimise on multiple discriminative tasks simultaneously. Models,
datasets and code will be made available after publication.
Supplementary materials
Title
Supplemetary Informartion for Chemformer: A Pre-Trained Transformer for Computational Chemistry
Description
Supplementary Tables and Results
Actions