Abstract
Recent years have seen a large interest in using the Simplified Molecular Input Line Entry System
(SMILES) chemical language as input for deep learning architectures solving chemical tasks. Many
successful applications have been demonstrated within de novo molecular design, quantitative
structure-activity relationship modelling, forward reaction prediction and single-step
retrosynthetic planning as examples. PySMILESUtils aims to enable these tasks by providing readyto-
use and adaptable Python classes for tokenization, augmentation, dataset, and dataloader
creation. Classes for handling datasets larger than memory and speeding up training by minimizing
padding are also provided. The framework subclasses PyTorch dataset and dataloaders but should
be adaptable for other deep learning frameworks. The project is open-sourced with a permissive
license and made available at GitHub: https://github.com/MolecularAI/pysmilesutils
Supplementary weblinks
Title
PySMILESUtils code
Description
The github repository for the PySMILESUtils package.
Actions
View