UnCorrupt  SMILES: a novel approach to de novo design

Linde Schoenmaker; Olivier J. M. Béquignon; Willem Jespers; Gerard J. P. van Westen

doi:10.26434/chemrxiv-2022-x3zng-v2

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

UnCorrupt SMILES: a novel approach to de novo design

14 November 2022, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Generative deep learning models have emerged as a powerful approach for de novo drug design, as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89 %, with different models having different error type distributions. Post hoc correction of SMILES increases model validity, with the SMILES corrector fixing 35 to 80 % of invalid model outputs. While, corrector models trained with one error per input sequence alter 60 to 90 % of invalid inputs, a higher performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60 to 95 % of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators with regard to novelty and similarity. Additionally, the SMILES corrector can also be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39 % and a novelty of approximately 20 %. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates.

Keywords

SMILES correction

invalid SMILES

molecular transformer

de novo drug design

analog generation

Supplementary materials

Title

Description

Actions

Title

Supporting information: UnCorrupt SMILES: a novel approach to de novo design

Description

Supporting information relating to working paper UnCorrupt SMILES: a novel approach to de novo design

Actions

Supplementary weblinks

Title

Description

Actions

Title

UnCorrupt SMILES repository

Description

The data and code required to recreate the results of this paper are available in the linked GitHub repository

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Nov 14, 2022 Version 2

Nov 11, 2022 Version 1

Version Notes

Add initials to author's names

Metrics

1,217

558

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-x3zng-v2

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

UnCorrupt SMILES: a novel approach to de novo design

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share