Uni-Mol: A Universal 3D Molecular Representation Learning Framework

Gengmo Zhou; Zhifeng Gao; Qiankun Ding; Hang Zheng; Hongteng Xu; Zhewei Wei; Linfeng Zhang; Guolin Ke

doi:10.26434/chemrxiv-2022-jjm0j-v3

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Uni-Mol: A Universal 3D Molecular Representation Learning Framework

08 September 2022, Version 3

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Molecular representation learning (MRL) has gained tremendous attention due to its critical role in learning from limited supervised data for applications like drug design. In most MRL methods, molecules are treated as 1D sequential tokens or 2D topology graphs, limiting their ability to incorporate 3D information for downstream tasks and, in particular, making it almost impossible for 3D geometry prediction or generation. Herein, we propose Uni-Mol, a universal MRL framework that significantly enlarges the representation ability and application scope of MRL schemes. Uni-Mol is composed of two models with the same SE(3)-equivariant transformer architecture: a molecular pretraining model trained by 209M molecular conformations; a pocket pretraining model trained by 3M candidate protein pocket data. The two models are used independently for separate tasks, and are combined when used in protein-ligand binding tasks. By properly incorporating 3D information, Uni-Mol outperforms SOTA in 14/15 molecular property prediction tasks. Moreover, Uni-Mol achieves superior performance in 3D spatial tasks, including protein-ligand binding pose prediction, molecular conformation generation, etc. Finally, we show that Uni-Mol can be successfully applied to the tasks with few-shot data like pocket druggability prediction. The code, model, and data are made publicly available at \url{https://github.com/dptech-corp/Uni-Mol}.

Keywords

Molecular Pretraining

Representation Learning

Molecular Property

Protein-Ligand Complex

Supplementary weblinks

Title

Description

Actions

Title

Code

Description

Open source code at GitHub

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 07, 2023 Version 4

Sep 08, 2022 Version 3

May 30, 2022 Version 2

May 26, 2022 Version 1

Version Notes

Updated experiment results based on our released code. Refine the binding pose dataset, excluding the similar complexes in the training set. More ablation studies. More discussions.

Metrics

25,258

21,952

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2022-jjm0j-v3

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Uni-Mol: A Universal 3D Molecular Representation Learning Framework

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share