3D Molecular Pocket-based Generation with Token-only Large Language Model

Jike Wang; Hao Luo; Rui Qin; Mingyang Wang; Meijing Fang; Odin Zhang; Qiaolin Gou; Qun Su; Chao Shen; Ziyi You; Xiaozhe Wan; Liwei Liu; Chang-Yu Hsieh; Tingjun Hou; Yu Kang

doi:10.26434/chemrxiv-2024-0ckgt

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

3D Molecular Pocket-based Generation with Token-only Large Language Model

09 July 2024, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Designing innovative molecular structures tailored to specific protein targets represents a fundamental challenge in drug discovery. Most existing approaches based on graph neural networks for generating three-dimensional (3D) molecules within protein pockets often produce molecules with invalid configurations, suboptimal drug-like qualities and limited synthesizability, while also requiring extended generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation. Initially, leveraging the architecture of large language models, we treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions and pre-train the model on an extensive dataset. This approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the model on protein pocket and molecule pair data, followed by reinforcement learning to further optimize the biophysical and chemical properties of the generated molecules. Experimental results demonstrate that, compared to existing methods, 3DSMILES-GPT generates molecules with superior metrics, such as Vina docking score. Notably, it achieves a 33% enhancement in the quantitative estimation of drug-likeness (QED) compared to current models. Additionally, the generation speed is significantly improved, with a threefold increase over the fastest existing methods. This innovative approach highlights the potential of 3DSMILES-GPT to revolutionize the generation of drug-like molecules, offering both improved efficacy and efficiency in drug discovery process.

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Detailed Jensen–Shannon divergence (×10−3) between the bond length

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.