Abstract
Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose FRATTVAE, a Transformer-based variational autoencoder that treats molecules as tree structures with fragments as nodes. FRATTVAE employs several innovative deep learning techniques, including ECFP (Extended Connectivity Fingerprints) based token embeddings and the Transformer’s self-attention mechanism, FRATTVAE efficiently handles large-scale compounds, improving both computational speed and generation accuracy. Evaluations across benchmark datasets, ranging from small molecules to natural compounds, demonstrate that FRATTVAE consistently outperforms existing models, achieving superior reconstruction accuracy and generation quality. Additionally, in molecular optimization tasks, FRATTVAE generated stable, high-quality molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.
Supplementary materials
Title
Supplementary Materials
Description
Supplemental Methods, Supplemental Tables, and Supplementary Figures.
Actions