Identifying Potential Missteps of Machine Learning in Molecular Chemistry

Anastasiia Smirnova; Artem Mitrofanov

doi:10.26434/chemrxiv-2025-csjff

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Identifying Potential Missteps of Machine Learning in Molecular Chemistry

07 March 2025, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning-based methods are widely used today in chemical tasks, particularly in drug design. Graph Convolutional Neural Networks (GCNNs) compete with one another in predicting chemical properties, achieving errors comparable with those of experimental measurements. However, the increasing complexity of data entry structures and the trend toward utilizing three-dimensional molecular geometries are rarely grounded in a thorough search for accurate conformations for input. In this study, we examined the stability of the state-of-the-art GCNN architecture for drug discovery and identified vulnerabilities related to the structural features of the compounds. We found that molecular weight significantly influenced the discrepancy between predicted and calculated HOMO-LUMO gap values. We demonstrated that high similarity between new molecules and the training dataset, as measured by Tanimoto indices, did not lead to a qualitative prediction of the model. In contrast, more dissimilar structures require adding less information to the training set for a successful active learning procedure.

Keywords

Drug design

Graph Convolutional Neural Networks

Molecular similarity

adversarial attacks

Vulnerability

Supplementary materials

Title

Description

Actions

Title

Supplementary information for the paper Identifying Potential Missteps of Machine Learning in Molecular Chemistry

Description

Supplementary information for the paper Identifying Potential Missteps of Machine Learning in Molecular Chemistry with additional figures.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 07, 2025 Version 1

Metrics

239

118

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2025-csjff

Funding

Non-commercial Foundation for the advancement of Science and Education “INTELLECT”.

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Identifying Potential Missteps of Machine Learning in Molecular Chemistry

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share